Clean up repository: remove backup files and reorganize infrastructure components

This commit is contained in:
Houzhong Xu 2025-10-02 17:04:51 +00:00
parent e5aa00d6f9
commit 1c994f9f60
No known key found for this signature in database
GPG Key ID: B44BEB1438F1B46F
133 changed files with 1835 additions and 11296 deletions

View File

@ -1 +0,0 @@
/mnt/fnsync/mcp/mcp_shared_config.json

View File

@ -1,41 +0,0 @@
# MCP 配置共享方案
本项目实现了跨主机多个IDE之间共享MCPModel Context Protocol配置的解决方案使用NFS卷实现跨主机同步。
## 配置结构
- `/root/.mcp/mcp_settings.json` - 主MCP配置文件符号链接指向NFS卷
- `/mnt/fnsync/mcp/mcp_shared_config.json` - NFS卷上的统一配置文件权威源
- `mcp_shared_config.json` - 指向NFS卷上配置文件的符号链接
- `sync_mcp_config.sh` - 同步脚本用于将统一配置复制到各个IDE
- `sync_all_mcp_configs.sh` - 完整同步脚本同步到所有可能的IDE和AI助手
- `.kilocode/mcp.json` - 指向共享配置的符号链接
- 其他IDE和AI助手的配置文件
## 统一配置内容
合并了以下MCP服务器
### 标准服务器
- context7: 提供库文档和代码示例
- filesystem: 文件系统访问
- sequentialthinking: 顺序思考工具
- git: Git 操作
- time: 时间相关操作
- memory: 知识图谱和记忆管理
- tavily: 网络搜索功能
## 使用方法
1. **更新配置**: 编辑 `/mnt/fnsync/mcp/mcp_shared_config.json` 文件以修改MCP服务器配置或通过符号链接 `/root/.mcp/mcp_settings.json`
2. **同步配置**:
- 运行 `./sync_mcp_config.sh` 同步到特定IDE
- 运行 `./sync_all_mcp_configs.sh` 同步到所有IDE和AI助手
3. **验证配置**: 确认各IDE中的MCP功能正常工作
## 维护说明
- 所有MCP配置更改都应在 `/mnt/fnsync/mcp/mcp_shared_config.json` 中进行(这是权威源)
- `/root/.mcp/mcp_settings.json` 现在是符号链接指向NFS卷上的统一配置
- 由于使用NFS卷配置更改会自动跨主机共享
- 如果添加新的IDE可以将其配置文件链接到或复制自 `/mnt/fnsync/mcp/mcp_shared_config.json`

View File

@ -1,572 +0,0 @@
# 🏗️ 基础设施管理项目
这是一个现代化的多云基础设施管理平台,专注于 OpenTofu、Ansible 和 Nomad + Podman 的集成管理。
## 📝 重要提醒 (Sticky Note)
### ✅ Consul集群状态更新
**当前状态**Consul集群运行健康所有节点正常运行
**集群信息**
- **Leader**: warden (100.122.197.112:8300)
- **节点数量**: 3个服务器节点
- **健康状态**: 所有节点健康检查通过
- **节点列表**:
- master (100.117.106.136) - 韩国主节点
- ash3c (100.116.80.94) - 美国服务器节点
- warden (100.122.197.112) - 北京服务器节点当前集群leader
**配置状态**
- Ansible inventory配置与实际集群状态一致
- 所有节点均为服务器模式
- bootstrap_expect=3符合实际节点数量
**依赖关系**
- Tailscale (第1天) ✅
- Ansible (第2天) ✅
- Nomad (第3天) ✅
- Consul (第4天) ✅ **已完成**
- Terraform (第5天) ✅ **进展良好**
- Vault (第6天) ⏳ 计划中
- Waypoint (第7天) ⏳ 计划中
**下一步计划**
- 继续推进Terraform状态管理
- 准备Vault密钥管理集成
- 规划Waypoint应用部署流程
---
## 🎯 项目特性
- **🌩️ 多云支持**: Oracle Cloud, 华为云, Google Cloud, AWS, DigitalOcean
- **🏗️ 基础设施即代码**: 使用 OpenTofu 管理云资源
- **⚙️ 配置管理**: 使用 Ansible 自动化配置和部署
- **🐳 容器编排**: Nomad 集群管理和 Podman 容器运行时
- **🔄 CI/CD**: Gitea Actions 自动化流水线
- **📊 监控**: Prometheus + Grafana 监控体系
- **🔐 安全**: 多层安全防护和合规性
## 🔄 架构分层与职责划分
### ⚠️ 重要Terraform 与 Nomad 的职责区分
本项目采用分层架构,明确区分了不同工具的职责范围,避免混淆:
#### 1. **Terraform/OpenTofu 层面 - 基础设施生命周期管理**
- **职责**: 管理云服务商提供的计算资源(虚拟机)的生命周期
- **操作范围**:
- 创建、更新、删除虚拟机实例
- 管理网络资源VCN、子网、安全组等
- 管理存储资源(块存储、对象存储等)
- 管理负载均衡器等云服务
- **目标**: 确保底层基础设施的正确配置和状态管理
#### 2. **Nomad 层面 - 应用资源调度与编排**
- **职责**: 在已经运行起来的虚拟机内部进行资源分配和应用编排
- **操作范围**:
- 在现有虚拟机上调度和运行容器化应用
- 管理应用的生命周期(启动、停止、更新)
- 资源分配和限制CPU、内存、存储
- 服务发现和负载均衡
- **目标**: 在已有基础设施上高效运行应用服务
#### 3. **关键区别**
- **Terraform** 关注的是**虚拟机本身**的生命周期管理
- **Nomad** 关注的是**在虚拟机内部**运行的应用的资源调度
- **Terraform** 决定"有哪些虚拟机"
- **Nomad** 决定"虚拟机上运行什么应用"
#### 4. **工作流程示例**
```
1. Terraform 创建虚拟机 (云服务商层面)
2. 虚拟机启动并运行操作系统
3. 在虚拟机上安装和配置 Nomad 客户端
4. Nomad 在虚拟机上调度和运行应用容器
```
**重要提醒**: 这两个层面不可混淆Terraform 不应该管理应用层面的资源Nomad 也不应该创建虚拟机。严格遵守此分层架构是项目成功的关键。
## 📁 项目结构
```
mgmt/
├── .gitea/workflows/ # CI/CD 工作流
├── tofu/ # OpenTofu 基础设施代码 (基础设施生命周期管理)
│ ├── environments/ # 环境配置 (dev/staging/prod)
│ ├── modules/ # 可复用模块
│ ├── providers/ # 云服务商配置
│ └── shared/ # 共享配置
├── configuration/ # Ansible 配置管理
│ ├── inventories/ # 主机清单
│ ├── playbooks/ # 剧本
│ ├── templates/ # 模板文件
│ └── group_vars/ # 组变量
├── jobs/ # Nomad 作业定义 (应用资源调度与编排)
│ ├── consul/ # Consul 集群配置
│ └── podman/ # Podman 相关作业
├── configs/ # 配置文件
│ ├── nomad-master.hcl # Nomad 主节点配置
│ └── nomad-ash3c.hcl # Nomad 客户端配置
├── docs/ # 文档
├── security/ # 安全配置
│ ├── certificates/ # 证书文件
│ └── policies/ # 安全策略
├── tests/ # 测试脚本和报告
│ ├── mcp_servers/ # MCP服务器测试脚本
│ ├── mcp_server_test_report.md # MCP服务器测试报告
│ └── legacy/ # 旧的测试脚本
├── tools/ # 工具和实用程序
├── playbooks/ # 核心Ansible剧本
└── Makefile # 项目管理命令
```
**架构分层说明**:
- **tofu/** 目录包含 Terraform/OpenTofu 代码,负责管理云服务商提供的计算资源生命周期
- **jobs/** 目录包含 Nomad 作业定义,负责在已有虚拟机内部进行应用资源调度
- 这两个目录严格分离,确保职责边界清晰
**注意:** 项目已从 Docker Swarm 迁移到 Nomad + Podman原有的 swarm 目录已不再使用。所有中间过程脚本和测试文件已清理保留核心配置文件以符合GitOps原则。
## 🔄 GitOps 原则
本项目遵循 GitOps 工作流,确保基础设施状态与 Git 仓库中的代码保持一致:
- **声明式配置**: 所有基础设施和应用程序配置都以声明式方式存储在 Git 中
- **版本控制和审计**: 所有变更都通过 Git 提交,提供完整的变更历史和审计跟踪
- **自动化同步**: 通过 CI/CD 流水线自动将 Git 中的变更应用到实际环境
- **状态收敛**: 系统会持续监控实际状态,并自动修复任何与期望状态的偏差
### GitOps 工作流程
1. **声明期望状态**: 在 Git 中定义基础设施和应用程序的期望状态
2. **提交变更**: 通过 Git 提交来应用变更
3. **自动同步**: CI/CD 系统检测到变更并自动应用到环境
4. **状态验证**: 系统验证实际状态与期望状态一致
5. **监控和告警**: 持续监控状态并在出现偏差时发出告警
这种工作流确保了环境的一致性、可重复性和可靠性,同时提供了完整的变更历史和回滚能力。
## 🚀 快速开始
### 1. 环境准备
```bash
# 克隆项目
git clone <repository-url>
cd mgmt
# 检查环境状态
./mgmt.sh status
# 快速部署(适用于开发环境)
./mgmt.sh deploy
```
### 2. 配置云服务商
```bash
# 复制配置模板
cp tofu/environments/dev/terraform.tfvars.example tofu/environments/dev/terraform.tfvars
# 编辑配置文件,填入你的云服务商凭据
vim tofu/environments/dev/terraform.tfvars
```
### 3. 初始化基础设施
```bash
# 初始化 OpenTofu
./mgmt.sh tofu init
# 查看执行计划
./mgmt.sh tofu plan
# 应用基础设施变更
cd tofu/environments/dev && tofu apply
```
### 4. 部署 Nomad 服务
```bash
# 部署 Consul 集群
nomad run /root/mgmt/jobs/consul/consul-cluster.nomad
# 查看 Nomad 任务
nomad job status
# 查看节点状态
nomad node status
```
### ⚠️ 重要提示:网络访问注意事项
**Tailscale 网络访问**
- 本项目中的 Nomad 和 Consul 服务通过 Tailscale 网络进行访问
- 访问 Nomad (端口 4646) 和 Consul (端口 8500) 时,必须使用 Tailscale 分配的 IP 地址
- 错误示例:`http://127.0.0.1:4646` 或 `http://localhost:8500` (无法连接)
- 正确示例:`http://100.x.x.x:4646` 或 `http://100.x.x.x:8500` (使用 Tailscale IP)
**获取 Tailscale IP**
```bash
# 查看当前节点的 Tailscale IP
tailscale ip -4
# 查看所有 Tailscale 网络中的节点
tailscale status
```
**常见问题**
- 如果遇到 "connection refused" 错误,请确认是否使用了正确的 Tailscale IP
- 确保 Tailscale 服务已启动并正常运行
- 检查网络策略是否允许通过 Tailscale 接口访问相关端口
- 更多详细经验和解决方案,请参考:[Consul 和 Nomad 访问问题经验教训](.gitea/issues/consul-nomad-access-lesson.md)
### 🔄 Nomad 集群领导者轮换与访问策略
**Nomad 集群领导者机制**
- Nomad 使用 Raft 协议实现分布式一致性,集群中只有一个领导者节点
- 领导者节点负责处理所有写入操作和协调集群状态
- 当领导者节点故障时,集群会自动选举新的领导者
**领导者轮换时的访问策略**
1. **动态发现领导者**
```bash
# 查询当前领导者节点
curl -s http://<任意Nomad服务器IP>:4646/v1/status/leader
# 返回结果示例: "100.90.159.68:4647"
# 使用返回的领导者地址进行API调用
curl -s http://100.90.159.68:4646/v1/nodes
```
2. **负载均衡方案**
- **DNS 负载均衡**:使用 Consul DNS 服务,通过 `nomad.service.consul` 解析到当前领导者
- **代理层负载均衡**:在 Nginx/HAProxy 配置中添加健康检查,自动路由到活跃的领导者节点
- **客户端重试机制**:在客户端代码中实现重试逻辑,当连接失败时尝试其他服务器节点
3. **推荐访问模式**
```bash
# 使用领导者发现脚本
#!/bin/bash
# 获取任意一个Nomad服务器IP
SERVER_IP="100.116.158.95"
# 查询当前领导者
LEADER=$(curl -s http://${SERVER_IP}:4646/v1/status/leader | sed 's/"//g')
# 使用领导者地址执行命令
nomad node status -address=http://${LEADER}
```
4. **高可用性配置**
- 将所有 Nomad 服务器节点添加到客户端配置中
- 客户端会自动连接到可用的服务器节点
- 对于写入操作,客户端会自动重定向到领导者节点
**注意事项**
- Nomad 集群领导者轮换是自动进行的,通常不需要人工干预
- 在领导者选举期间,集群可能会短暂无法处理写入操作
- 建议在应用程序中实现适当的重试逻辑,以处理领导者切换期间的临时故障
## 🛠️ 常用命令
| 命令 | 描述 |
|------|------|
| `make status` | 显示项目状态总览 |
| `make deploy` | 快速部署所有服务 |
| `make cleanup` | 清理所有部署的服务 |
| `cd tofu/environments/dev && tofu <cmd>` | OpenTofu 管理命令 |
| `nomad job status` | 查看 Nomad 任务状态 |
| `nomad node status` | 查看 Nomad 节点状态 |
| `podman ps` | 查看运行中的容器 |
| `ansible-playbook playbooks/configure-nomad-clients.yml` | 配置 Nomad 客户端 |
| `./run_tests.sh` 或 `make test-mcp` | 运行所有MCP服务器测试 |
| `make test-kali` | 运行Kali Linux快速健康检查 |
| `make test-kali-security` | 运行Kali Linux安全工具测试 |
| `make test-kali-full` | 运行Kali Linux完整测试套件 |
## 🌩️ 支持的云服务商
### Oracle Cloud Infrastructure (OCI)
- ✅ 计算实例
- ✅ 网络配置 (VCN, 子网, 安全组)
- ✅ 存储 (块存储, 对象存储)
- ✅ 负载均衡器
### 华为云
- ✅ 弹性云服务器 (ECS)
- ✅ 虚拟私有云 (VPC)
- ✅ 弹性负载均衡 (ELB)
- ✅ 云硬盘 (EVS)
### Google Cloud Platform
- ✅ Compute Engine
- ✅ VPC 网络
- ✅ Cloud Load Balancing
- ✅ Persistent Disk
### Amazon Web Services
- ✅ EC2 实例
- ✅ VPC 网络
- ✅ Application Load Balancer
- ✅ EBS 存储
### DigitalOcean
- ✅ Droplets
- ✅ VPC 网络
- ✅ Load Balancers
- ✅ Block Storage
## 🔄 CI/CD 流程
### 基础设施部署流程
1. **代码提交** → 触发 Gitea Actions
2. **OpenTofu Plan** → 生成执行计划
3. **人工审核** → 确认变更
4. **OpenTofu Apply** → 应用基础设施变更
5. **Ansible 部署** → 配置和部署应用
### 应用部署流程
1. **应用代码更新** → 构建容器镜像
2. **镜像推送** → 推送到镜像仓库
3. **Nomad Job 更新** → 更新任务定义
4. **Nomad 部署** → 滚动更新服务
5. **健康检查** → 验证部署状态
## 📊 监控和可观测性
### 监控组件
- **Prometheus**: 指标收集和存储
- **Grafana**: 可视化仪表板
- **AlertManager**: 告警管理
- **Node Exporter**: 系统指标导出
### 日志管理
- **ELK Stack**: Elasticsearch + Logstash + Kibana
- **Fluentd**: 日志收集和转发
- **结构化日志**: JSON 格式标准化
## 🔐 安全最佳实践
### 基础设施安全
- **网络隔离**: VPC, 安全组, 防火墙
- **访问控制**: IAM 角色和策略
- **数据加密**: 传输和静态加密
- **密钥管理**: 云服务商密钥管理服务
### 应用安全
- **容器安全**: 镜像扫描, 最小权限
- **网络安全**: 服务网格, TLS 终止
- **秘密管理**: Docker Secrets, Ansible Vault
- **安全审计**: 日志监控和审计
## 🧪 测试策略
### 基础设施测试
- **语法检查**: OpenTofu validate
- **安全扫描**: Checkov, tfsec
- **合规检查**: OPA (Open Policy Agent)
### 应用测试
- **单元测试**: 应用代码测试
- **集成测试**: 服务间集成测试
- **端到端测试**: 完整流程测试
### MCP服务器测试
项目包含完整的MCPModel Context Protocol服务器测试套件位于 `tests/mcp_servers/` 目录:
- **context7服务器测试**: 验证初始化、工具列表和搜索功能
- **qdrant服务器测试**: 测试文档添加、搜索和删除功能
- **qdrant-ollama服务器测试**: 验证向量数据库与LLM集成功能
测试脚本包括Shell脚本和Python脚本支持通过JSON-RPC协议直接测试MCP服务器功能。详细的测试结果和问题修复记录请参考 `tests/mcp_server_test_report.md`。
运行测试:
```bash
# 运行单个测试脚本
cd tests/mcp_servers
./test_local_mcp_servers.sh
# 或运行Python测试
python test_mcp_servers_simple.py
```
### Kali Linux系统测试
项目包含完整的Kali Linux系统测试套件位于 `configuration/playbooks/test/` 目录。测试包括:
1. **快速健康检查** (`kali-health-check.yml`): 基本系统状态检查
2. **安全工具测试** (`kali-security-tools.yml`): 测试各种安全工具的安装和功能
3. **完整系统测试** (`test-kali.yml`): 全面的系统测试和报告生成
4. **完整测试套件** (`kali-full-test-suite.yml`): 按顺序执行所有测试
运行测试:
```bash
# Kali Linux快速健康检查
make test-kali
# Kali Linux安全工具测试
make test-kali-security
# Kali Linux完整测试套件
make test-kali-full
```
## 📚 文档
- [Consul集群故障排除](docs/consul-cluster-troubleshooting.md)
- [磁盘管理](docs/disk-management.md)
- [Nomad NFS设置](docs/nomad-nfs-setup.md)
- [Consul-Terraform集成](docs/setup/consul-terraform-integration.md)
- [OCI凭据设置](docs/setup/oci-credentials-setup.md)
- [Oracle云设置](docs/setup/oracle-cloud-setup.md)
## 🤝 贡献指南
1. Fork 项目
2. 创建特性分支 (`git checkout -b feature/amazing-feature`)
3. 提交变更 (`git commit -m 'Add amazing feature'`)
4. 推送到分支 (`git push origin feature/amazing-feature`)
5. 创建 Pull Request
## 📄 许可证
本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情。
## 🆘 支持
如果你遇到问题或有疑问:
1. 查看 [文档](docs/)
2. 搜索 [Issues](../../issues)
3. 创建新的 [Issue](../../issues/new)
## ⚠️ 重要经验教训
### Terraform 与 Nomad 职责区分
**问题**:在基础设施管理中容易混淆 Terraform 和 Nomad 的职责范围,导致架构设计混乱。
**根本原因**Terraform 和 Nomad 虽然都是基础设施管理工具,但它们在架构中处于不同层面,负责不同类型的资源管理。
**解决方案**
1. **明确分层架构**
- **Terraform/OpenTofu**:负责云服务商提供的计算资源(虚拟机)的生命周期管理
- **Nomad**:负责在已有虚拟机内部进行应用资源调度和编排
2. **职责边界清晰**
- Terraform 决定"有哪些虚拟机"
- Nomad 决定"虚拟机上运行什么应用"
- 两者不应越界管理对方的资源
3. **工作流程分离**
```
1. Terraform 创建虚拟机 (云服务商层面)
2. 虚拟机启动并运行操作系统
3. 在虚拟机上安装和配置 Nomad 客户端
4. Nomad 在虚拟机上调度和运行应用容器
```
**重要提醒**:严格遵守这种分层架构是项目成功的关键。任何混淆这两个层面职责的做法都会导致架构混乱和管理困难。
### Consul 和 Nomad 访问问题
**问题**:尝试访问 Consul 服务时,使用 `http://localhost:8500` 或 `http://127.0.0.1:8500` 无法连接。
**根本原因**:本项目中的 Consul 和 Nomad 服务通过 Nomad + Podman 在集群中运行,并通过 Tailscale 网络进行访问。这些服务不在本地运行,因此无法通过 localhost 访问。
**解决方案**
1. **使用 Tailscale IP**:必须使用 Tailscale 分配的 IP 地址访问服务
```bash
# 查看当前节点的 Tailscale IP
tailscale ip -4
# 查看所有 Tailscale 网络中的节点
tailscale status
# 访问 Consul (使用实际的 Tailscale IP)
curl http://100.x.x.x:8500/v1/status/leader
# 访问 Nomad (使用实际的 Tailscale IP)
curl http://100.x.x.x:4646/v1/status/leader
```
2. **服务发现**Consul 集群由 3 个节点组成Nomad 集群由十多个节点组成,需要正确识别服务运行的节点
3. **集群架构**
- Consul 集群3 个节点 (kr-master, us-ash3c, bj-warden)
- Nomad 集群:十多个节点,包括服务器节点和客户端节点
**重要提醒**:在开发和调试过程中,始终记住使用 Tailscale IP 而不是 localhost 访问集群服务。这是本项目架构的基本要求,必须严格遵守。
### Consul 集群配置管理经验
**问题**Consul集群配置文件与实际运行状态不一致导致集群管理混乱和配置错误。
**根本原因**Ansible inventory配置文件中的节点信息与实际Consul集群中的节点状态不匹配包括节点角色、数量和expect值等关键配置。
**解决方案**
1. **定期验证集群状态**使用Consul API定期检查集群实际状态确保配置文件与实际运行状态一致
```bash
# 查看Consul集群节点信息
curl -s http://<consul-server>:8500/v1/catalog/nodes
# 查看节点详细信息
curl -s http://<consul-server>:8500/v1/agent/members
# 查看集群leader信息
curl -s http://<consul-server>:8500/v1/status/leader
```
2. **保持配置文件一致性**确保所有相关的inventory配置文件如`csol-consul-nodes.ini`、`consul-nodes.ini`、`consul-cluster.ini`)保持一致,包括:
- 服务器节点列表和数量
- 客户端节点列表和数量
- `bootstrap_expect`值(必须与实际服务器节点数量匹配)
- 节点角色和IP地址
3. **正确识别节点角色**通过API查询确认每个节点的实际角色避免将服务器节点误配置为客户端节点或反之
```json
// API返回的节点信息示例
{
"Name": "warden",
"Addr": "100.122.197.112",
"Port": 8300,
"Status": 1,
"ProtocolVersion": 2,
"Delegate": 1,
"Server": true // 确认节点角色
}
```
4. **更新配置流程**:当发现配置与实际状态不匹配时,按照以下步骤更新:
- 使用API获取集群实际状态
- 根据实际状态更新所有相关配置文件
- 确保所有配置文件中的信息保持一致
- 更新配置文件中的说明和注释,反映最新的集群状态
**实际案例**
- **初始状态**配置文件显示2个服务器节点和5个客户端节点`bootstrap_expect=2`
- **实际状态**Consul集群运行3个服务器节点master、ash3c、warden无客户端节点`expect=3`
- **解决方案**更新所有配置文件将服务器节点数量改为3个移除所有客户端节点配置将`bootstrap_expect`值更新为3
**重要提醒**Consul集群配置必须与实际运行状态保持严格一致。任何不匹配都可能导致集群不稳定或功能异常。定期使用Consul API验证集群状态并及时更新配置文件是确保集群稳定运行的关键。
## 🎉 致谢
感谢所有为这个项目做出贡献的开发者和社区成员!
## 脚本整理
项目脚本已重新整理,按功能分类存放在 `scripts/` 目录中:
- `scripts/setup/` - 环境设置和初始化
- `scripts/deployment/` - 部署相关脚本
- `scripts/testing/` - 测试脚本
- `scripts/utilities/` - 工具脚本
- `scripts/mcp/` - MCP 服务器相关
- `scripts/ci-cd/` - CI/CD 相关
详细信息请查看 [脚本索引](scripts/SCRIPT_INDEX.md)。

View File

@ -0,0 +1,104 @@
---
# Ansible Playbook: 部署 Consul Client 到所有 Nomad 节点
- name: Deploy Consul Client to Nomad nodes
hosts: nomad_clients:nomad_servers
become: yes
vars:
consul_version: "1.21.5"
consul_datacenter: "dc1"
consul_servers:
- "100.117.106.136:8300" # master (韩国)
- "100.122.197.112:8300" # warden (北京)
- "100.116.80.94:8300" # ash3c (美国)
tasks:
- name: Update APT cache
apt:
update_cache: yes
- name: Install consul via APT (假设源已存在)
apt:
name: consul={{ consul_version }}-*
state: present
update_cache: yes
register: consul_installed
- name: Create consul user (if not exists)
user:
name: consul
system: yes
shell: /bin/false
home: /opt/consul
create_home: yes
- name: Create consul directories
file:
path: "{{ item }}"
state: directory
owner: consul
group: consul
mode: '0755'
loop:
- /opt/consul
- /opt/consul/data
- /etc/consul.d
- /var/log/consul
- name: Get node Tailscale IP
shell: ip addr show tailscale0 | grep 'inet ' | awk '{print $2}' | cut -d'/' -f1
register: tailscale_ip
failed_when: tailscale_ip.stdout == ""
- name: Create consul client configuration
template:
src: templates/consul-client.hcl.j2
dest: /etc/consul.d/consul.hcl
owner: consul
group: consul
mode: '0644'
notify: restart consul
- name: Create consul systemd service
template:
src: templates/consul.service.j2
dest: /etc/systemd/system/consul.service
owner: root
group: root
mode: '0644'
notify: reload systemd
- name: Enable and start consul service
systemd:
name: consul
enabled: yes
state: started
notify: restart consul
- name: Wait for consul to be ready
uri:
url: "http://{{ tailscale_ip.stdout }}:8500/v1/status/leader"
status_code: 200
timeout: 5
register: consul_leader_status
until: consul_leader_status.status == 200
retries: 30
delay: 5
- name: Verify consul cluster membership
shell: consul members -status=alive -format=json | jq -r '.[].Name'
register: consul_members
changed_when: false
- name: Display cluster status
debug:
msg: "Node {{ inventory_hostname.split('.')[0] }} joined cluster with {{ consul_members.stdout_lines | length }} members"
handlers:
- name: reload systemd
systemd:
daemon_reload: yes
- name: restart consul
systemd:
name: consul
state: restarted

View File

@ -0,0 +1,59 @@
---
# Ansible Inventory for Consul Client Deployment
all:
children:
consul_servers:
hosts:
master.tailnet-68f9.ts.net:
ansible_host: 100.117.106.136
region: korea
warden.tailnet-68f9.ts.net:
ansible_host: 100.122.197.112
region: beijing
ash3c.tailnet-68f9.ts.net:
ansible_host: 100.116.80.94
region: usa
nomad_servers:
hosts:
# Nomad Server 节点也需要 Consul Client
semaphore.tailnet-68f9.ts.net:
ansible_host: 100.116.158.95
region: korea
ch3.tailnet-68f9.ts.net:
ansible_host: 100.86.141.112
region: switzerland
ash1d.tailnet-68f9.ts.net:
ansible_host: 100.81.26.3
region: usa
ash2e.tailnet-68f9.ts.net:
ansible_host: 100.103.147.94
region: usa
ch2.tailnet-68f9.ts.net:
ansible_host: 100.90.159.68
region: switzerland
de.tailnet-68f9.ts.net:
ansible_host: 100.120.225.29
region: germany
onecloud1.tailnet-68f9.ts.net:
ansible_host: 100.98.209.50
region: unknown
nomad_clients:
hosts:
# 需要部署 Consul Client 的节点
influxdb1.tailnet-68f9.ts.net:
ansible_host: "{{ influxdb1_ip }}" # 需要填入实际IP
region: beijing
browser.tailnet-68f9.ts.net:
ansible_host: "{{ browser_ip }}" # 需要填入实际IP
region: beijing
# hcp1 已经有 Consul Client可选择重新配置
# hcp1.tailnet-68f9.ts.net:
# ansible_host: 100.97.62.111
# region: beijing
vars:
ansible_user: root
ansible_ssh_private_key_file: ~/.ssh/id_rsa
consul_datacenter: dc1

View File

@ -0,0 +1,61 @@
# Consul Client Configuration for {{ inventory_hostname }}
datacenter = "{{ consul_datacenter }}"
data_dir = "/opt/consul/data"
log_level = "INFO"
node_name = "{{ inventory_hostname.split('.')[0] }}"
bind_addr = "{{ tailscale_ip.stdout }}"
# Client mode (not server)
server = false
# Connect to Consul servers (指向三节点集群)
retry_join = [
"100.117.106.136", # master (韩国)
"100.122.197.112", # warden (北京)
"100.116.80.94" # ash3c (美国)
]
# Performance optimization
performance {
raft_multiplier = 5
}
# Ports configuration
ports {
grpc = 8502
http = 8500
dns = 8600
}
# Enable Connect for service mesh
connect {
enabled = true
}
# Cache configuration for performance
cache {
entry_fetch_max_burst = 42
entry_fetch_rate = 30
}
# Node metadata
node_meta = {
region = "{{ region | default('unknown') }}"
zone = "nomad-server"
}
# UI disabled for clients
ui_config {
enabled = false
}
# ACL configuration (if needed)
acl = {
enabled = false
default_policy = "allow"
}
# Logging
log_file = "/var/log/consul/consul.log"
log_rotate_duration = "24h"
log_rotate_max_files = 7

View File

@ -0,0 +1,26 @@
[Unit]
Description=Consul Client
Documentation=https://www.consul.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/consul.d/consul.hcl
[Service]
Type=notify
User=consul
Group=consul
ExecStart=/usr/bin/consul agent -config-dir=/etc/consul.d
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
LimitNOFILE=65536
# Security settings
NoNewPrivileges=yes
PrivateTmp=yes
ProtectHome=yes
ProtectSystem=strict
ReadWritePaths=/opt/consul /var/log/consul
[Install]
WantedBy=multi-user.target

View File

@ -0,0 +1,19 @@
# Consul 配置
## 部署
```bash
nomad job run components/consul/jobs/consul-cluster.nomad
```
## Job 信息
- **Job 名称**: `consul-cluster-nomad`
- **类型**: service
- **节点**: master, ash3c, warden
## 访问方式
- Master: `http://master.tailnet-68f9.ts.net:8500`
- Ash3c: `http://ash3c.tailnet-68f9.ts.net:8500`
- Warden: `http://warden.tailnet-68f9.ts.net:8500`

View File

@ -1,412 +0,0 @@
job "consul-cluster-dynamic" {
datacenters = ["dc1"]
type = "service"
group "consul-master" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "kr-master"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
# 使用模板生成配置文件
template {
data = <<EOF
# Consul配置文件 - 动态生成
# 此文件由consul-template根据Consul KV存储中的配置动态生成
# 基础配置
data_dir = "/opt/consul/data"
raft_dir = "/opt/consul/raft"
# 启用UI
ui_config {
enabled = true
}
# 数据中心配置
datacenter = "dc1"
# 服务器配置
server = true
bootstrap_expect = 3
# 网络配置
client_addr = "master"
bind_addr = "master"
advertise_addr = "master"
# 端口配置
ports {
dns = 8600
http = 8500
https = -1
grpc = 8502
grpc_tls = 8503
serf_lan = 8301
serf_wan = 8302
server = 8300
}
# 集群连接
retry_join = ["ash3c", "warden"]
# 服务发现
enable_service_script = true
enable_script_checks = true
enable_local_script_checks = true
# 性能调优
performance {
raft_multiplier = 1
}
# 日志配置
log_level = "INFO"
enable_syslog = false
log_file = "/var/log/consul/consul.log"
# 安全配置
encrypt = "YourEncryptionKeyHere"
# 连接配置
reconnect_timeout = "30s"
reconnect_timeout_wan = "30s"
session_ttl_min = "10s"
# Autopilot配置
autopilot {
cleanup_dead_servers = true
last_contact_threshold = "200ms"
max_trailing_logs = 250
server_stabilization_time = "10s"
redundancy_zone_tag = ""
disable_upgrade_migration = false
upgrade_version_tag = ""
}
# 快照配置
snapshot {
enabled = true
interval = "24h"
retain = 30
name = "consul-snapshot-{{.Timestamp}}"
}
# 备份配置
backup {
enabled = true
interval = "6h"
retain = 7
name = "consul-backup-{{.Timestamp}}"
}
EOF
destination = "local/consul.hcl"
}
config {
command = "consul"
args = [
"agent",
"-config-dir=local"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-ash3c" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "us-ash3c"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
# 使用模板生成配置文件
template {
data = <<EOF
# Consul配置文件 - 动态生成
# 此文件由consul-template根据Consul KV存储中的配置动态生成
# 基础配置
data_dir = "/opt/consul/data"
raft_dir = "/opt/consul/raft"
# 启用UI
ui_config {
enabled = true
}
# 数据中心配置
datacenter = "dc1"
# 服务器配置
server = true
bootstrap_expect = 3
# 网络配置
client_addr = "ash3c"
bind_addr = "ash3c"
advertise_addr = "ash3c"
# 端口配置
ports {
dns = 8600
http = 8500
https = -1
grpc = 8502
grpc_tls = 8503
serf_lan = 8301
serf_wan = 8302
server = 8300
}
# 集群连接
retry_join = ["master", "warden"]
# 服务发现
enable_service_script = true
enable_script_checks = true
enable_local_script_checks = true
# 性能调优
performance {
raft_multiplier = 1
}
# 日志配置
log_level = "INFO"
enable_syslog = false
log_file = "/var/log/consul/consul.log"
# 安全配置
encrypt = "YourEncryptionKeyHere"
# 连接配置
reconnect_timeout = "30s"
reconnect_timeout_wan = "30s"
session_ttl_min = "10s"
# Autopilot配置
autopilot {
cleanup_dead_servers = true
last_contact_threshold = "200ms"
max_trailing_logs = 250
server_stabilization_time = "10s"
redundancy_zone_tag = ""
disable_upgrade_migration = false
upgrade_version_tag = ""
}
# 快照配置
snapshot {
enabled = true
interval = "24h"
retain = 30
name = "consul-snapshot-{{.Timestamp}}"
}
# 备份配置
backup {
enabled = true
interval = "6h"
retain = 7
name = "consul-backup-{{.Timestamp}}"
}
EOF
destination = "local/consul.hcl"
}
config {
command = "consul"
args = [
"agent",
"-config-dir=local"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-warden" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "bj-warden"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
# 使用模板生成配置文件
template {
data = <<EOF
# Consul配置文件 - 动态生成
# 此文件由consul-template根据Consul KV存储中的配置动态生成
# 基础配置
data_dir = "/opt/consul/data"
raft_dir = "/opt/consul/raft"
# 启用UI
ui_config {
enabled = true
}
# 数据中心配置
datacenter = "dc1"
# 服务器配置
server = true
bootstrap_expect = 3
# 网络配置
client_addr = "warden"
bind_addr = "warden"
advertise_addr = "warden"
# 端口配置
ports {
dns = 8600
http = 8500
https = -1
grpc = 8502
grpc_tls = 8503
serf_lan = 8301
serf_wan = 8302
server = 8300
}
# 集群连接
retry_join = ["master", "ash3c"]
# 服务发现
enable_service_script = true
enable_script_checks = true
enable_local_script_checks = true
# 性能调优
performance {
raft_multiplier = 1
}
# 日志配置
log_level = "INFO"
enable_syslog = false
log_file = "/var/log/consul/consul.log"
# 安全配置
encrypt = "YourEncryptionKeyHere"
# 连接配置
reconnect_timeout = "30s"
reconnect_timeout_wan = "30s"
session_ttl_min = "10s"
# Autopilot配置
autopilot {
cleanup_dead_servers = true
last_contact_threshold = "200ms"
max_trailing_logs = 250
server_stabilization_time = "10s"
redundancy_zone_tag = ""
disable_upgrade_migration = false
upgrade_version_tag = ""
}
# 快照配置
snapshot {
enabled = true
interval = "24h"
retain = 30
name = "consul-snapshot-{{.Timestamp}}"
}
# 备份配置
backup {
enabled = true
interval = "6h"
retain = 7
name = "consul-backup-{{.Timestamp}}"
}
EOF
destination = "local/consul.hcl"
}
config {
command = "consul"
args = [
"agent",
"-config-dir=local"
]
}
resources {
cpu = 300
memory = 512
}
}
}
}

View File

@ -1,421 +0,0 @@
job "consul-cluster-kv" {
datacenters = ["dc1"]
type = "service"
group "consul-master" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "kr-master"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
# 使用模板从Consul KV获取配置
template {
data = <<EOF
# Consul配置文件 - 从KV存储动态获取
# 遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
# 基础配置
data_dir = "{{ keyOrDefault `config/dev/consul/cluster/data_dir` `/opt/consul/data` }}"
raft_dir = "{{ keyOrDefault `config/dev/consul/cluster/raft_dir` `/opt/consul/raft` }}"
# 启用UI
ui_config {
enabled = {{ keyOrDefault `config/dev/consul/ui/enabled` `true` }}
}
# 数据中心配置
datacenter = "{{ keyOrDefault `config/dev/consul/cluster/datacenter` `dc1` }}"
# 服务器配置
server = true
bootstrap_expect = {{ keyOrDefault `config/dev/consul/cluster/bootstrap_expect` `3` }}
# 网络配置
client_addr = "{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}"
bind_addr = "{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}"
advertise_addr = "{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}"
# 端口配置
ports {
dns = {{ keyOrDefault `config/dev/consul/ports/dns` `8600` }}
http = {{ keyOrDefault `config/dev/consul/ports/http` `8500` }}
https = {{ keyOrDefault `config/dev/consul/ports/https` `-1` }}
grpc = {{ keyOrDefault `config/dev/consul/ports/grpc` `8502` }}
grpc_tls = {{ keyOrDefault `config/dev/consul/ports/grpc_tls` `8503` }}
serf_lan = {{ keyOrDefault `config/dev/consul/ports/serf_lan` `8301` }}
serf_wan = {{ keyOrDefault `config/dev/consul/ports/serf_wan` `8302` }}
server = {{ keyOrDefault `config/dev/consul/ports/server` `8300` }}
}
# 集群连接 - 从KV获取其他节点IP
retry_join = [
"{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}",
"{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
]
# 服务发现
enable_service_script = {{ keyOrDefault `config/dev/consul/service/enable_service_script` `true` }}
enable_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_script_checks` `true` }}
enable_local_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_local_script_checks` `true` }}
# 性能调优
performance {
raft_multiplier = {{ keyOrDefault `config/dev/consul/performance/raft_multiplier` `1` }}
}
# 日志配置
log_level = "{{ keyOrDefault `config/dev/consul/cluster/log_level` `INFO` }}"
enable_syslog = {{ keyOrDefault `config/dev/consul/log/enable_syslog` `false` }}
log_file = "{{ keyOrDefault `config/dev/consul/log/log_file` `/var/log/consul/consul.log` }}"
# 安全配置
encrypt = "{{ keyOrDefault `config/dev/consul/cluster/encrypt_key` `YourEncryptionKeyHere` }}"
# 连接配置
reconnect_timeout = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout` `30s` }}"
reconnect_timeout_wan = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout_wan` `30s` }}"
session_ttl_min = "{{ keyOrDefault `config/dev/consul/connection/session_ttl_min` `10s` }}"
# Autopilot配置
autopilot {
cleanup_dead_servers = {{ keyOrDefault `config/dev/consul/autopilot/cleanup_dead_servers` `true` }}
last_contact_threshold = "{{ keyOrDefault `config/dev/consul/autopilot/last_contact_threshold` `200ms` }}"
max_trailing_logs = {{ keyOrDefault `config/dev/consul/autopilot/max_trailing_logs` `250` }}
server_stabilization_time = "{{ keyOrDefault `config/dev/consul/autopilot/server_stabilization_time` `10s` }}
redundancy_zone_tag = ""
disable_upgrade_migration = {{ keyOrDefault `config/dev/consul/autopilot/disable_upgrade_migration` `false` }}
upgrade_version_tag = ""
}
# 快照配置
snapshot {
enabled = {{ keyOrDefault `config/dev/consul/snapshot/enabled` `true` }}
interval = "{{ keyOrDefault `config/dev/consul/snapshot/interval` `24h` }}"
retain = {{ keyOrDefault `config/dev/consul/snapshot/retain` `30` }}
name = "{{ keyOrDefault `config/dev/consul/snapshot/name` `consul-snapshot-{{.Timestamp}}` }}"
}
# 备份配置
backup {
enabled = {{ keyOrDefault `config/dev/consul/backup/enabled` `true` }}
interval = "{{ keyOrDefault `config/dev/consul/backup/interval` `6h` }}"
retain = {{ keyOrDefault `config/dev/consul/backup/retain` `7` }}
name = "{{ keyOrDefault `config/dev/consul/backup/name` `consul-backup-{{.Timestamp}}` }}"
}
EOF
destination = "local/consul.hcl"
}
config {
command = "consul"
args = [
"agent",
"-config-dir=local"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-ash3c" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "us-ash3c"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
# 使用模板从Consul KV获取配置
template {
data = <<EOF
# Consul配置文件 - 从KV存储动态获取
# 遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
# 基础配置
data_dir = "{{ keyOrDefault `config/dev/consul/cluster/data_dir` `/opt/consul/data` }}"
raft_dir = "{{ keyOrDefault `config/dev/consul/cluster/raft_dir` `/opt/consul/raft` }}"
# 启用UI
ui_config {
enabled = {{ keyOrDefault `config/dev/consul/ui/enabled` `true` }}
}
# 数据中心配置
datacenter = "{{ keyOrDefault `config/dev/consul/cluster/datacenter` `dc1` }}"
# 服务器配置
server = true
bootstrap_expect = {{ keyOrDefault `config/dev/consul/cluster/bootstrap_expect` `3` }}
# 网络配置
client_addr = "{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}"
bind_addr = "{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}"
advertise_addr = "{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}"
# 端口配置
ports {
dns = {{ keyOrDefault `config/dev/consul/ports/dns` `8600` }}
http = {{ keyOrDefault `config/dev/consul/ports/http` `8500` }}
https = {{ keyOrDefault `config/dev/consul/ports/https` `-1` }}
grpc = {{ keyOrDefault `config/dev/consul/ports/grpc` `8502` }}
grpc_tls = {{ keyOrDefault `config/dev/consul/ports/grpc_tls` `8503` }}
serf_lan = {{ keyOrDefault `config/dev/consul/ports/serf_lan` `8301` }}
serf_wan = {{ keyOrDefault `config/dev/consul/ports/serf_wan` `8302` }}
server = {{ keyOrDefault `config/dev/consul/ports/server` `8300` }}
}
# 集群连接 - 从KV获取其他节点IP
retry_join = [
"{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}",
"{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
]
# 服务发现
enable_service_script = {{ keyOrDefault `config/dev/consul/service/enable_service_script` `true` }}
enable_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_script_checks` `true` }}
enable_local_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_local_script_checks` `true` }}
# 性能调优
performance {
raft_multiplier = {{ keyOrDefault `config/dev/consul/performance/raft_multiplier` `1` }}
}
# 日志配置
log_level = "{{ keyOrDefault `config/dev/consul/cluster/log_level` `INFO` }}"
enable_syslog = {{ keyOrDefault `config/dev/consul/log/enable_syslog` `false` }}
log_file = "{{ keyOrDefault `config/dev/consul/log/log_file` `/var/log/consul/consul.log` }}"
# 安全配置
encrypt = "{{ keyOrDefault `config/dev/consul/cluster/encrypt_key` `YourEncryptionKeyHere` }}"
# 连接配置
reconnect_timeout = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout` `30s` }}"
reconnect_timeout_wan = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout_wan` `30s` }}"
session_ttl_min = "{{ keyOrDefault `config/dev/consul/connection/session_ttl_min` `10s` }}"
# Autopilot配置
autopilot {
cleanup_dead_servers = {{ keyOrDefault `config/dev/consul/autopilot/cleanup_dead_servers` `true` }}
last_contact_threshold = "{{ keyOrDefault `config/dev/consul/autopilot/last_contact_threshold` `200ms` }}"
max_trailing_logs = {{ keyOrDefault `config/dev/consul/autopilot/max_trailing_logs` `250` }}
server_stabilization_time = "{{ keyOrDefault `config/dev/consul/autopilot/server_stabilization_time` `10s` }}"
redundancy_zone_tag = ""
disable_upgrade_migration = {{ keyOrDefault `config/dev/consul/autopilot/disable_upgrade_migration` `false` }}
upgrade_version_tag = ""
}
# 快照配置
snapshot {
enabled = {{ keyOrDefault `config/dev/consul/snapshot/enabled` `true` }}
interval = "{{ keyOrDefault `config/dev/consul/snapshot/interval` `24h` }}"
retain = {{ keyOrDefault `config/dev/consul/snapshot/retain` `30` }}
name = "{{ keyOrDefault `config/dev/consul/snapshot/name` `consul-snapshot-{{.Timestamp}}` }}"
}
# 备份配置
backup {
enabled = {{ keyOrDefault `config/dev/consul/backup/enabled` `true` }}
interval = "{{ keyOrDefault `config/dev/consul/backup/interval` `6h` }}"
retain = {{ keyOrDefault `config/dev/consul/backup/retain` `7` }}
name = "{{ keyOrDefault `config/dev/consul/backup/name` `consul-backup-{{.Timestamp}}` }}"
}
EOF
destination = "local/consul.hcl"
}
config {
command = "consul"
args = [
"agent",
"-config-dir=local"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-warden" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "bj-warden"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
# 使用模板从Consul KV获取配置
template {
data = <<EOF
# Consul配置文件 - 从KV存储动态获取
# 遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
# 基础配置
data_dir = "{{ keyOrDefault `config/dev/consul/cluster/data_dir` `/opt/consul/data` }}"
raft_dir = "{{ keyOrDefault `config/dev/consul/cluster/raft_dir` `/opt/consul/raft` }}"
# 启用UI
ui_config {
enabled = {{ keyOrDefault `config/dev/consul/ui/enabled` `true` }}
}
# 数据中心配置
datacenter = "{{ keyOrDefault `config/dev/consul/cluster/datacenter` `dc1` }}"
# 服务器配置
server = true
bootstrap_expect = {{ keyOrDefault `config/dev/consul/cluster/bootstrap_expect` `3` }}
# 网络配置
client_addr = "{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
bind_addr = "{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
advertise_addr = "{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
# 端口配置
ports {
dns = {{ keyOrDefault `config/dev/consul/ports/dns` `8600` }}
http = {{ keyOrDefault `config/dev/consul/ports/http` `8500` }}
https = {{ keyOrDefault `config/dev/consul/ports/https` `-1` }}
grpc = {{ keyOrDefault `config/dev/consul/ports/grpc` `8502` }}
grpc_tls = {{ keyOrDefault `config/dev/consul/ports/grpc_tls` `8503` }}
serf_lan = {{ keyOrDefault `config/dev/consul/ports/serf_lan` `8301` }}
serf_wan = {{ keyOrDefault `config/dev/consul/ports/serf_wan` `8302` }}
server = {{ keyOrDefault `config/dev/consul/ports/server` `8300` }}
}
# 集群连接 - 从KV获取其他节点IP
retry_join = [
"{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}",
"{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}"
]
# 服务发现
enable_service_script = {{ keyOrDefault `config/dev/consul/service/enable_service_script` `true` }}
enable_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_script_checks` `true` }}
enable_local_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_local_script_checks` `true` }}
# 性能调优
performance {
raft_multiplier = {{ keyOrDefault `config/dev/consul/performance/raft_multiplier` `1` }}
}
# 日志配置
log_level = "{{ keyOrDefault `config/dev/consul/cluster/log_level` `INFO` }}"
enable_syslog = {{ keyOrDefault `config/dev/consul/log/enable_syslog` `false` }}
log_file = "{{ keyOrDefault `config/dev/consul/log/log_file` `/var/log/consul/consul.log` }}"
# 安全配置
encrypt = "{{ keyOrDefault `config/dev/consul/cluster/encrypt_key` `YourEncryptionKeyHere` }}"
# 连接配置
reconnect_timeout = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout` `30s` }}"
reconnect_timeout_wan = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout_wan` `30s` }}"
session_ttl_min = "{{ keyOrDefault `config/dev/consul/connection/session_ttl_min` `10s` }}"
# Autopilot配置
autopilot {
cleanup_dead_servers = {{ keyOrDefault `config/dev/consul/autopilot/cleanup_dead_servers` `true` }}
last_contact_threshold = "{{ keyOrDefault `config/dev/consul/autopilot/last_contact_threshold` `200ms` }}"
max_trailing_logs = {{ keyOrDefault `config/dev/consul/autopilot/max_trailing_logs` `250` }}
server_stabilization_time = "{{ keyOrDefault `config/dev/consul/autopilot/server_stabilization_time` `10s` }}
redundancy_zone_tag = ""
disable_upgrade_migration = {{ keyOrDefault `config/dev/consul/autopilot/disable_upgrade_migration` `false` }}
upgrade_version_tag = ""
}
# 快照配置
snapshot {
enabled = {{ keyOrDefault `config/dev/consul/snapshot/enabled` `true` }}
interval = "{{ keyOrDefault `config/dev/consul/snapshot/interval` `24h` }}"
retain = {{ keyOrDefault `config/dev/consul/snapshot/retain` `30` }}
name = "{{ keyOrDefault `config/dev/consul/snapshot/name` `consul-snapshot-{{.Timestamp}}` }}"
}
# 备份配置
backup {
enabled = {{ keyOrDefault `config/dev/consul/backup/enabled` `true` }}
interval = "{{ keyOrDefault `config/dev/consul/backup/interval` `6h` }}"
retain = {{ keyOrDefault `config/dev/consul/backup/retain` `7` }}
name = "{{ keyOrDefault `config/dev/consul/backup/name` `consul-backup-{{.Timestamp}}` }}"
}
EOF
destination = "local/consul.hcl"
}
config {
command = "consul"
args = [
"agent",
"-config-dir=local"
]
}
resources {
cpu = 300
memory = 512
}
}
}
}

View File

@ -1,225 +0,0 @@
job "consul-cluster-simple" {
datacenters = ["dc1"]
type = "service"
group "consul-master" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "kr-master"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=0.0.0.0",
"-advertise=100.117.106.136",
"-retry-join=100.116.80.94",
"-retry-join=100.122.197.112",
"-ui",
"-http-port=${NOMAD_PORT_http}",
"-server-port=${NOMAD_PORT_rpc}",
"-serf-lan-port=${NOMAD_PORT_serf_lan}",
"-serf-wan-port=${NOMAD_PORT_serf_wan}"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-ash3c" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "us-ash3c"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=0.0.0.0",
"-advertise=100.116.80.94",
"-retry-join=100.117.106.136",
"-retry-join=100.122.197.112",
"-ui",
"-http-port=${NOMAD_PORT_http}",
"-server-port=${NOMAD_PORT_rpc}",
"-serf-lan-port=${NOMAD_PORT_serf_lan}",
"-serf-wan-port=${NOMAD_PORT_serf_wan}"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-warden" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "bj-warden"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.122.197.112",
"-advertise=100.122.197.112",
"-retry-join=100.117.106.136",
"-retry-join=100.116.80.94",
"-ui",
"-http-port=${NOMAD_PORT_http}",
"-server-port=${NOMAD_PORT_rpc}",
"-serf-lan-port=${NOMAD_PORT_serf_lan}",
"-serf-wan-port=${NOMAD_PORT_serf_wan}"
]
}
resources {
cpu = 300
memory = 512
}
}
}
group "consul-semaphore" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "semaphore"
}
network {
port "http" {
static = 8500
}
port "rpc" {
static = 8300
}
port "serf_lan" {
static = 8301
}
port "serf_wan" {
static = 8302
}
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.116.158.95",
"-advertise=100.116.158.95",
"-retry-join=100.117.106.136",
"-retry-join=100.116.80.94",
"-retry-join=100.122.197.112",
"-ui",
"-http-port=${NOMAD_PORT_http}",
"-server-port=${NOMAD_PORT_rpc}",
"-serf-lan-port=${NOMAD_PORT_serf_lan}",
"-serf-wan-port=${NOMAD_PORT_serf_wan}"
]
}
resources {
cpu = 300
memory = 512
}
}
}
}

View File

@ -1,57 +1,115 @@
job "consul-cluster" {
job "consul-cluster-nomad" {
datacenters = ["dc1"]
type = "service"
group "consul-servers" {
count = 3
group "consul-master" {
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "(master|ash3c|hcp)"
value = "master"
}
task "consul" {
driver = "podman"
driver = "exec"
config {
image = "hashicorp/consul:latest"
ports = ["server", "serf_lan", "serf_wan", "ui"]
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/consul/data",
"-ui",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind={{ env `NOMAD_IP_server` }}",
"-retry-join=100.117.106.136",
"-bind=100.117.106.136",
"-advertise=100.117.106.136",
"-retry-join=100.116.80.94",
"-retry-join=100.76.13.187"
"-retry-join=100.122.197.112",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
]
}
volume_mount {
volume = "consul-data"
destination = "/consul/data"
read_only = false
resources {
cpu = 300
memory = 512
}
}
}
group "consul-ash3c" {
constraint {
attribute = "${node.unique.name}"
value = "ash3c"
}
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.116.80.94",
"-advertise=100.116.80.94",
"-retry-join=100.117.106.136",
"-retry-join=100.122.197.112",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
]
}
resources {
network {
mbits = 10
port "server" { static = 8300 }
port "serf_lan" { static = 8301 }
port "serf_wan" { static = 8302 }
port "ui" { static = 8500 }
}
cpu = 300
memory = 512
}
}
}
group "consul-warden" {
constraint {
attribute = "${node.unique.name}"
value = "warden"
}
volume "consul-data" {
type = "host"
read_only = false
source = "consul-data"
task "consul" {
driver = "exec"
config {
command = "consul"
args = [
"agent",
"-server",
"-bootstrap-expect=3",
"-data-dir=/opt/nomad/data/consul",
"-client=0.0.0.0",
"-bind=100.122.197.112",
"-advertise=100.122.197.112",
"-retry-join=100.117.106.136",
"-retry-join=100.116.80.94",
"-ui",
"-http-port=8500",
"-server-port=8300",
"-serf-lan-port=8301",
"-serf-wan-port=8302"
]
}
resources {
cpu = 300
memory = 512
}
}
}
}

View File

@ -0,0 +1,8 @@
# Nomad 配置
## Jobs
- `install-podman-driver.nomad` - 安装 Podman 驱动
- `nomad-consul-config.nomad` - Nomad-Consul 配置
- `nomad-consul-setup.nomad` - Nomad-Consul 设置
- `nomad-nfs-volume.nomad` - NFS 卷配置

View File

@ -0,0 +1,55 @@
job "nomad-consul-config" {
datacenters = ["dc1"]
type = "system"
group "nomad-server-config" {
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "semaphore|ash1d|ash2e|ch2|ch3|onecloud1|de"
}
task "update-nomad-config" {
driver = "exec"
config {
command = "sh"
args = [
"-c",
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
]
}
resources {
cpu = 100
memory = 128
}
}
}
group "nomad-client-config" {
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "master|ash3c|browser|influxdb1|hcp1|warden"
}
task "update-nomad-config" {
driver = "exec"
config {
command = "sh"
args = [
"-c",
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
]
}
resources {
cpu = 100
memory = 128
}
}
}
}

View File

@ -0,0 +1,23 @@
job "nomad-consul-setup" {
datacenters = ["dc1"]
type = "system"
group "nomad-config" {
task "setup-consul" {
driver = "exec"
config {
command = "sh"
args = [
"-c",
"if grep -q 'server.*enabled.*true' /etc/nomad.d/nomad.hcl; then sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl; else sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl; fi && systemctl restart nomad"
]
}
resources {
cpu = 100
memory = 128
}
}
}
}

View File

@ -0,0 +1,28 @@
# Traefik 配置
## 部署
```bash
nomad job run components/traefik/jobs/traefik.nomad
```
## 配置特点
- 明确绑定 Tailscale IP (100.97.62.111)
- 地理位置优化的 Consul 集群顺序(北京 → 韩国 → 美国)
- 适合跨太平洋网络的宽松健康检查
- 无服务健康检查,避免 flapping
## 访问方式
- Dashboard: `http://hcp1.tailnet-68f9.ts.net:8080/dashboard/`
- 直接 IP: `http://100.97.62.111:8080/dashboard/`
- Consul LB: `http://hcp1.tailnet-68f9.ts.net:80`
## 故障排除
如果遇到服务 flapping 问题:
1. 检查是否使用了 RFC1918 私有地址
2. 确认 Tailscale 网络连通性
3. 调整健康检查间隔时间
4. 考虑地理位置对网络延迟的影响

View File

@ -0,0 +1,97 @@
job "traefik-consul-lb" {
datacenters = ["dc1"]
type = "service"
group "traefik" {
count = 1
constraint {
attribute = "${node.unique.name}"
value = "hcp1"
}
update {
min_healthy_time = "5s"
healthy_deadline = "10m"
progress_deadline = "15m"
auto_revert = false
}
network {
mode = "host"
port "http" {
static = 80
host_network = "tailscale0"
}
port "traefik" {
static = 8080
host_network = "tailscale0"
}
}
task "traefik" {
driver = "exec"
config {
command = "/usr/local/bin/traefik"
args = [
"--configfile=/local/traefik.yml"
]
}
template {
data = <<EOF
api:
dashboard: true
insecure: true
entryPoints:
web:
address: "100.97.62.111:80"
traefik:
address: "100.97.62.111:8080"
providers:
file:
filename: /local/dynamic.yml
watch: true
log:
level: INFO
EOF
destination = "local/traefik.yml"
}
template {
data = <<EOF
http:
services:
consul-cluster:
loadBalancer:
servers:
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
- url: "http://master.tailnet-68f9.ts.net:8500" # 备用
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 备用
healthCheck:
path: "/v1/status/leader"
interval: "30s"
timeout: "15s"
routers:
consul-api:
rule: "PathPrefix(`/`)"
service: consul-cluster
entryPoints:
- web
EOF
destination = "local/dynamic.yml"
}
resources {
cpu = 500
memory = 512
}
}
}
}

View File

@ -0,0 +1,7 @@
# Vault 配置
## Jobs
- `vault-cluster-exec.nomad` - Vault 集群 (exec 驱动)
- `vault-cluster-podman.nomad` - Vault 集群 (podman 驱动)
- `vault-dev-warden.nomad` - Vault 开发环境

View File

@ -39,8 +39,14 @@ job "vault-cluster-exec" {
template {
data = <<EOH
storage "file" {
path = "/opt/nomad/data/vault/data"
storage "consul" {
address = "{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
path = "vault/"
# Consul服务发现配置
service {
name = "vault"
tags = ["vault"]
}
}
listener "tcp" {
@ -58,20 +64,12 @@ disable_mlock = true
disable_sealwrap = true
disable_cache = false
# 配置consul连接
consul {
address = "127.0.0.1:8500"
path = "vault/"
# 注意可能需要配置token
# token = "your-consul-token"
}
# 启用原始日志记录
enable_raw_log = true
# 配置consul连接
consul {
address = "127.0.0.1:8500"
path = "vault/"
# 注意可能需要配置token
# token = "your-consul-token"
# 集成Nomad服务发现
service_registration {
enabled = true
}
EOH
destination = "/opt/nomad/data/vault/config/vault.hcl"
@ -100,14 +98,7 @@ EOH
group "vault-ash3c" {
count = 1
# 显式指定consul版本要求覆盖自动约束
constraint {
attribute = "${attr.consul.version}"
operator = "version"
value = ">= 1.0.0"
}
# 添加一个总是满足的约束来确保调度
# 移除对consul版本的约束使用driver约束替代
constraint {
attribute = "${driver.exec}"
operator = "="
@ -141,8 +132,14 @@ EOH
template {
data = <<EOH
storage "file" {
path = "/opt/nomad/data/vault/data"
storage "consul" {
address = "{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
path = "vault/"
# Consul服务发现配置
service {
name = "vault"
tags = ["vault"]
}
}
listener "tcp" {
@ -159,6 +156,14 @@ disable_mlock = true
# 添加更多配置来解决权限问题
disable_sealwrap = true
disable_cache = false
# 启用原始日志记录
enable_raw_log = true
# 集成Nomad服务发现
service_registration {
enabled = true
}
EOH
destination = "/opt/nomad/data/vault/config/vault.hcl"
}
@ -186,14 +191,7 @@ EOH
group "vault-warden" {
count = 1
# 显式指定consul版本要求覆盖自动约束
constraint {
attribute = "${attr.consul.version}"
operator = "version"
value = ">= 1.0.0"
}
# 添加一个总是满足的约束来确保调度
# 移除对consul版本的约束使用driver约束替代
constraint {
attribute = "${driver.exec}"
operator = "="
@ -227,8 +225,14 @@ EOH
template {
data = <<EOH
storage "file" {
path = "/opt/nomad/data/vault/data"
storage "consul" {
address = "{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
path = "vault/"
# Consul服务发现配置
service {
name = "vault"
tags = ["vault"]
}
}
listener "tcp" {
@ -245,6 +249,14 @@ disable_mlock = true
# 添加更多配置来解决权限问题
disable_sealwrap = true
disable_cache = false
# 启用原始日志记录
enable_raw_log = true
# 集成Nomad服务发现
service_registration {
enabled = true
}
EOH
destination = "/opt/nomad/data/vault/config/vault.hcl"
}

View File

@ -1,21 +1,23 @@
[nomad_servers]
# 服务器节点 (7个服务器节点)
#本机不操作bj-semaphore ansible_host=100.116.158.95 ansible_user=root ansible_password=3131 ansible_become_password=3131
ash1d ansible_host=100.81.26.3 ansible_user=ben ansible_password=3131 ansible_become_password=3131
ash2e ansible_host=100.103.147.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
ch2 ansible_host=100.90.159.68 ansible_user=ben ansible_password=3131 ansible_become_password=3131
ch3 ansible_host=100.86.141.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
onecloud1 ansible_host=100.98.209.50 ansible_user=ben ansible_password=3131 ansible_become_password=3131
de ansible_host=100.120.225.29 ansible_user=ben ansible_password=3131 ansible_become_password=3131
# ⚠️ 警告:能力越大,责任越大!服务器节点操作需极其谨慎!
# ⚠️ 任何对服务器节点的操作都可能影响整个集群的稳定性!
semaphore ansible_host=semaphore.tailnet-68f9.ts.net ansible_user=root ansible_password=313131 ansible_become_password=313131
ash1d ansible_host=ash1d.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
ash2e ansible_host=ash2e.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
ch2 ansible_host=ch2.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
ch3 ansible_host=ch3.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
onecloud1 ansible_host=onecloud1.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
de ansible_host=de.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
[nomad_clients]
# 客户端节点
master ansible_host=100.117.106.136 ansible_user=ben ansible_password=3131 ansible_become_password=3131 ansible_port=60022
ash3c ansible_host=100.116.80.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
browser ansible_host=100.116.112.45 ansible_user=ben ansible_password=3131 ansible_become_password=3131
influxdb1 ansible_host=100.116.80.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
hcp1 ansible_host=100.97.62.111 ansible_user=root ansible_password=3131 ansible_become_password=3131
warden ansible_host=100.122.197.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
master ansible_host=master.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131 ansible_port=60022
ash3c ansible_host=ash3c.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
browser ansible_host=browser.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
influxdb1 ansible_host=influxdb1.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
hcp1 ansible_host=hcp1.tailnet-68f9.ts.net ansible_user=root ansible_password=3131 ansible_become_password=3131
warden ansible_host=warden.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
[nomad_nodes:children]
nomad_servers

View File

@ -4,17 +4,6 @@
become: yes
vars:
nomad_config_dir: /etc/nomad.d
client_ip: "{{ ansible_host }}"
# Nomad节点名称带地理位置前缀
client_name: >-
{%- if inventory_hostname == 'influxdb1' -%}us-influxdb
{%- elif inventory_hostname == 'master' -%}kr-master
{%- elif inventory_hostname == 'hcp1' -%}bj-hcp1
{%- elif inventory_hostname == 'hcp2' -%}bj-hcp2
{%- elif inventory_hostname == 'warden' -%}bj-warden
{%- else -%}{{ inventory_hostname }}
{%- endif -%}
tasks:
- name: 创建Nomad配置目录

View File

@ -1,104 +0,0 @@
---
- name: 配置Nomad客户端节点
hosts: target_nodes
become: yes
vars:
nomad_config_dir: /etc/nomad.d
tasks:
- name: 创建Nomad配置目录
file:
path: "{{ nomad_config_dir }}"
state: directory
owner: root
group: root
mode: '0755'
- name: 复制Nomad客户端配置
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
log_level = "INFO"
bind_addr = "0.0.0.0"
server {
enabled = false
}
client {
enabled = true
# 配置七姐妹服务器地址
servers = [
"100.116.158.95:4647", # bj-semaphore
"100.81.26.3:4647", # ash1d
"100.103.147.94:4647", # ash2e
"100.90.159.68:4647", # ch2
"100.86.141.112:4647", # ch3
"100.98.209.50:4647", # bj-onecloud1
"100.120.225.29:4647" # de
]
host_volume "fnsync" {
path = "/mnt/fnsync"
read_only = false
}
# 禁用Docker驱动只使用Podman
options {
"driver.raw_exec.enable" = "1"
"driver.exec.enable" = "1"
}
}
# 配置Podman插件目录
plugin_dir = "/opt/nomad/plugins"
addresses {
http = "{{ ansible_host }}"
rpc = "{{ ansible_host }}"
serf = "{{ ansible_host }}"
}
advertise {
http = "{{ ansible_host }}:4646"
rpc = "{{ ansible_host }}:4647"
serf = "{{ ansible_host }}:4648"
}
consul {
address = "100.116.158.95:8500"
}
# 配置Podman驱动
plugin "podman" {
config {
volumes {
enabled = true
}
logging {
type = "journald"
}
gc {
container = true
}
}
}
dest: "{{ nomad_config_dir }}/nomad.hcl"
owner: root
group: root
mode: '0644'
- name: 启动Nomad服务
systemd:
name: nomad
state: restarted
enabled: yes
daemon_reload: yes
- name: 检查Nomad服务状态
command: systemctl status nomad
register: nomad_status
changed_when: false
- name: 显示Nomad服务状态
debug:
var: nomad_status.stdout_lines

View File

@ -1,104 +0,0 @@
---
- name: 配置Nomad客户端节点
hosts: target_nodes
become: yes
vars:
nomad_config_dir: /etc/nomad.d
tasks:
- name: 创建Nomad配置目录
file:
path: "{{ nomad_config_dir }}"
state: directory
owner: root
group: root
mode: '0755'
- name: 复制Nomad客户端配置
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
log_level = "INFO"
bind_addr = "0.0.0.0"
server {
enabled = false
}
client {
enabled = true
# 配置七姐妹服务器地址
servers = [
"100.116.158.95:4647", # bj-semaphore
"100.81.26.3:4647", # ash1d
"100.103.147.94:4647", # ash2e
"100.90.159.68:4647", # ch2
"100.86.141.112:4647", # ch3
"100.98.209.50:4647", # bj-onecloud1
"100.120.225.29:4647" # de
]
host_volume "fnsync" {
path = "/mnt/fnsync"
read_only = false
}
# 禁用Docker驱动只使用Podman
options {
"driver.raw_exec.enable" = "1"
"driver.exec.enable" = "1"
}
}
# 配置Podman插件目录
plugin_dir = "/opt/nomad/plugins"
addresses {
http = "{{ ansible_host }}"
rpc = "{{ ansible_host }}"
serf = "{{ ansible_host }}"
}
advertise {
http = "{{ ansible_host }}:4646"
rpc = "{{ ansible_host }}:4647"
serf = "{{ ansible_host }}:4648"
}
consul {
address = "100.116.158.95:8500"
}
# 配置Podman驱动
plugin "podman" {
config {
volumes {
enabled = true
}
logging {
type = "journald"
}
gc {
container = true
}
}
}
dest: "{{ nomad_config_dir }}/nomad.hcl"
owner: root
group: root
mode: '0644'
- name: 启动Nomad服务
systemd:
name: nomad
state: restarted
enabled: yes
daemon_reload: yes
- name: 检查Nomad服务状态
command: systemctl status nomad
register: nomad_status
changed_when: false
- name: 显示Nomad服务状态
debug:
var: nomad_status.stdout_lines

View File

@ -0,0 +1,44 @@
---
- name: 统一配置所有Nomad节点
hosts: nomad_nodes
become: yes
tasks:
- name: 备份当前Nomad配置
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak
remote_src: yes
ignore_errors: yes
- name: 生成统一Nomad配置
template:
src: ../templates/nomad-unified.hcl.j2
dest: /etc/nomad.d/nomad.hcl
owner: root
group: root
mode: '0644'
- name: 重启Nomad服务
systemd:
name: nomad
state: restarted
enabled: yes
daemon_reload: yes
- name: 等待Nomad服务就绪
wait_for:
port: 4646
host: "{{ inventory_hostname }}.tailnet-68f9.ts.net"
delay: 10
timeout: 60
ignore_errors: yes
- name: 检查Nomad服务状态
command: systemctl status nomad
register: nomad_status
changed_when: false
- name: 显示Nomad服务状态
debug:
var: nomad_status.stdout_lines

View File

@ -1,105 +0,0 @@
---
- name: 部署韩国节点Nomad配置
hosts: ch2,ch3
become: yes
gather_facts: no
vars:
nomad_config_dir: "/etc/nomad.d"
nomad_config_file: "{{ nomad_config_dir }}/nomad.hcl"
source_config_dir: "/root/mgmt/infrastructure/configs/server"
tasks:
- name: 获取主机名短名称(去掉.global后缀
set_fact:
short_hostname: "{{ inventory_hostname | regex_replace('\\.global$', '') }}"
- name: 确保 Nomad 配置目录存在
file:
path: "{{ nomad_config_dir }}"
state: directory
owner: root
group: root
mode: '0755'
- name: 部署 Nomad 配置文件到韩国节点
copy:
src: "{{ source_config_dir }}/nomad-{{ short_hostname }}.hcl"
dest: "{{ nomad_config_file }}"
owner: root
group: root
mode: '0644'
backup: yes
notify: restart nomad
- name: 检查 Nomad 二进制文件位置
shell: which nomad || find /usr -name nomad 2>/dev/null | head -1
register: nomad_binary_path
failed_when: nomad_binary_path.stdout == ""
- name: 创建/更新 Nomad systemd 服务文件
copy:
dest: "/etc/systemd/system/nomad.service"
owner: root
group: root
mode: '0644'
content: |
[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/
Requires=network-online.target
After=network-online.target
[Service]
Type=notify
User=root
Group=root
ExecStart={{ nomad_binary_path.stdout }} agent -config=/etc/nomad.d/nomad.hcl
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
notify: restart nomad
- name: 确保 Nomad 数据目录存在
file:
path: "/opt/nomad/data"
state: directory
owner: root
group: root
mode: '0755'
- name: 重新加载 systemd daemon
systemd:
daemon_reload: yes
- name: 启用并启动 Nomad 服务
systemd:
name: nomad
enabled: yes
state: started
- name: 等待 Nomad 服务启动
wait_for:
port: 4646
host: "{{ ansible_host }}"
delay: 5
timeout: 30
ignore_errors: yes
- name: 显示 Nomad 服务状态
command: systemctl status nomad
register: nomad_status
changed_when: false
- name: 显示 Nomad 服务状态信息
debug:
var: nomad_status.stdout_lines
handlers:
- name: restart nomad
systemd:
name: nomad
state: restarted

View File

@ -1,105 +0,0 @@
---
- name: 部署韩国节点Nomad配置
hosts: ch2,ch3
become: yes
gather_facts: no
vars:
nomad_config_dir: "/etc/nomad.d"
nomad_config_file: "{{ nomad_config_dir }}/nomad.hcl"
source_config_dir: "/root/mgmt/infrastructure/configs/server"
tasks:
- name: 获取主机名短名称(去掉后缀)
set_fact:
short_hostname: "{{ inventory_hostname | regex_replace('\\$', '') }}"
- name: 确保 Nomad 配置目录存在
file:
path: "{{ nomad_config_dir }}"
state: directory
owner: root
group: root
mode: '0755'
- name: 部署 Nomad 配置文件到韩国节点
copy:
src: "{{ source_config_dir }}/nomad-{{ short_hostname }}.hcl"
dest: "{{ nomad_config_file }}"
owner: root
group: root
mode: '0644'
backup: yes
notify: restart nomad
- name: 检查 Nomad 二进制文件位置
shell: which nomad || find /usr -name nomad 2>/dev/null | head -1
register: nomad_binary_path
failed_when: nomad_binary_path.stdout == ""
- name: 创建/更新 Nomad systemd 服务文件
copy:
dest: "/etc/systemd/system/nomad.service"
owner: root
group: root
mode: '0644'
content: |
[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/
Requires=network-online.target
After=network-online.target
[Service]
Type=notify
User=root
Group=root
ExecStart={{ nomad_binary_path.stdout }} agent -config=/etc/nomad.d/nomad.hcl
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
notify: restart nomad
- name: 确保 Nomad 数据目录存在
file:
path: "/opt/nomad/data"
state: directory
owner: root
group: root
mode: '0755'
- name: 重新加载 systemd daemon
systemd:
daemon_reload: yes
- name: 启用并启动 Nomad 服务
systemd:
name: nomad
enabled: yes
state: started
- name: 等待 Nomad 服务启动
wait_for:
port: 4646
host: "{{ ansible_host }}"
delay: 5
timeout: 30
ignore_errors: yes
- name: 显示 Nomad 服务状态
command: systemctl status nomad
register: nomad_status
changed_when: false
- name: 显示 Nomad 服务状态信息
debug:
var: nomad_status.stdout_lines
handlers:
- name: restart nomad
systemd:
name: nomad
state: restarted

View File

@ -0,0 +1,73 @@
---
- name: 修正Nomad节点的Consul角色配置
hosts: nomad_nodes
become: yes
vars:
consul_addresses: "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
tasks:
- name: 备份原始Nomad配置
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak_{{ ansible_date_time.iso8601 }}
remote_src: yes
- name: 检查节点角色
shell: grep -A 1 "server {" /etc/nomad.d/nomad.hcl | grep "enabled = true" | wc -l
register: is_server
changed_when: false
- name: 检查节点角色
shell: grep -A 1 "client {" /etc/nomad.d/nomad.hcl | grep "enabled = true" | wc -l
register: is_client
changed_when: false
- name: 修正服务器节点的Consul配置
blockinfile:
path: /etc/nomad.d/nomad.hcl
marker: "# {mark} ANSIBLE MANAGED BLOCK - CONSUL CONFIG"
block: |
consul {
address = "{{ consul_addresses }}"
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
server_auto_join = true
client_auto_join = false
}
replace: true
when: is_server.stdout == "1"
- name: 修正客户端节点的Consul配置
blockinfile:
path: /etc/nomad.d/nomad.hcl
marker: "# {mark} ANSIBLE MANAGED BLOCK - CONSUL CONFIG"
block: |
consul {
address = "{{ consul_addresses }}"
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
server_auto_join = false
client_auto_join = true
}
replace: true
when: is_client.stdout == "1"
- name: 重启Nomad服务
systemd:
name: nomad
state: restarted
enabled: yes
daemon_reload: yes
- name: 等待Nomad服务启动
wait_for:
port: 4646
host: "{{ ansible_host }}"
timeout: 30
- name: 显示节点角色和配置
debug:
msg: "节点 {{ inventory_hostname }} 是 {{ '服务器' if is_server.stdout == '1' else '客户端' }} 节点Consul配置已更新"

View File

@ -0,0 +1,43 @@
---
- name: 更新所有Nomad节点的Consul配置
hosts: nomad_nodes
become: yes
vars:
consul_addresses: "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
tasks:
- name: 备份原始Nomad配置
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.backup.{{ ansible_date_time.epoch }}
remote_src: yes
backup: yes
- name: 更新Nomad Consul配置
lineinfile:
path: /etc/nomad.d/nomad.hcl
regexp: '^\s*address\s*=\s*".*"'
line: ' address = "{{ consul_addresses }}"'
state: present
- name: 重启Nomad服务
systemd:
name: nomad
state: restarted
enabled: yes
daemon_reload: yes
- name: 等待Nomad服务启动
wait_for:
port: 4646
host: "{{ ansible_host }}"
timeout: 30
- name: 检查Nomad服务状态
systemd:
name: nomad
register: nomad_status
- name: 显示Nomad服务状态
debug:
msg: "节点 {{ inventory_hostname }} Nomad服务状态: {{ nomad_status.status.ActiveState }}"

View File

@ -0,0 +1,26 @@
---
- name: 紧急回滚 - 恢复直连Consul配置
hosts: nomad_nodes
become: yes
tasks:
- name: 🚨 紧急回滚Consul配置
replace:
path: /etc/nomad.d/nomad.hcl
regexp: 'address = "hcp1.tailnet-68f9.ts.net:80"'
replace: 'address = "100.117.106.136:8500"'
notify: restart nomad
- name: ✅ 验证回滚配置
shell: grep "address.*=" /etc/nomad.d/nomad.hcl
register: rollback_config
- name: 📋 显示回滚后配置
debug:
msg: "回滚后配置: {{ rollback_config.stdout }}"
handlers:
- name: restart nomad
systemd:
name: nomad
state: restarted

View File

@ -2,20 +2,20 @@ datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "{{ client_name }}"
name = "{{ inventory_hostname }}"
bind_addr = "{{ client_ip }}"
bind_addr = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
addresses {
http = "{{ client_ip }}"
rpc = "{{ client_ip }}"
serf = "{{ client_ip }}"
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
}
advertise {
http = "{{ client_ip }}:4646"
rpc = "{{ client_ip }}:4647"
serf = "{{ client_ip }}:4648"
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4646"
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4647"
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4648"
}
ports {
@ -30,15 +30,17 @@ server {
client {
enabled = true
# 配置七仙女服务器地址使用短名
network_interface = "tailscale0"
# 配置七仙女服务器地址使用完整FQDN
servers = [
"semaphore:4647", # bj-semaphore
"ash1d:4647", # ash1d
"ash2e:4647", # ash2e
"ch2:4647", # ch2
"ch3:4647", # ch3
"onecloud1:4647", # bj-onecloud1
"de:4647" # de
"semaphore.tailnet-68f9.ts.net:4647",
"ash1d.tailnet-68f9.ts.net:4647",
"ash2e.tailnet-68f9.ts.net:4647",
"ch2.tailnet-68f9.ts.net:4647",
"ch3.tailnet-68f9.ts.net:4647",
"onecloud1.tailnet-68f9.ts.net:4647",
"de.tailnet-68f9.ts.net:4647"
]
# 配置host volumes
@ -52,6 +54,18 @@ client {
"driver.raw_exec.enable" = "1"
"driver.exec.enable" = "1"
}
# 配置节点元数据
meta {
consul = "true"
consul_version = "1.21.5"
consul_server = {% if inventory_hostname in ['master', 'ash3c', 'warden'] %}"true"{% else %}"false"{% endif %}
}
# 激进的垃圾清理策略
gc_interval = "5m"
gc_disk_usage_threshold = 80
gc_inode_usage_threshold = 70
}
plugin "nomad-driver-podman" {
@ -64,13 +78,26 @@ plugin "nomad-driver-podman" {
}
consul {
address = "master:8500,ash3c:8500,warden:8500"
address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
server_auto_join = true
client_auto_join = true
}
vault {
enabled = true
address = "http://master:8200,http://ash3c:8200,http://warden:8200"
address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}
telemetry {
collection_interval = "1s"
disable_hostname = false
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}

View File

@ -4,12 +4,18 @@ plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "{{ server_name }}"
bind_addr = "{{ server_ip }}"
bind_addr = "{{ server_name }}.tailnet-68f9.ts.net"
addresses {
http = "{{ server_ip }}"
rpc = "{{ server_ip }}"
serf = "{{ server_ip }}"
http = "{{ server_name }}.tailnet-68f9.ts.net"
rpc = "{{ server_name }}.tailnet-68f9.ts.net"
serf = "{{ server_name }}.tailnet-68f9.ts.net"
}
advertise {
http = "{{ server_name }}.tailnet-68f9.ts.net:4646"
rpc = "{{ server_name }}.tailnet-68f9.ts.net:4647"
serf = "{{ server_name }}.tailnet-68f9.ts.net:4648"
}
ports {
@ -20,8 +26,14 @@ ports {
server {
enabled = true
bootstrap_expect = 3
retry_join = ["semaphore", "ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"]
bootstrap_expect = 7
retry_join = [
{%- for server in groups['nomad_servers'] -%}
{%- if server != inventory_hostname -%}
"{{ server }}.tailnet-68f9.ts.net"{% if not loop.last %},{% endif %}
{%- endif -%}
{%- endfor -%}
]
}
client {
@ -38,12 +50,17 @@ plugin "nomad-driver-podman" {
}
consul {
address = "master:8500,ash3c:8500,warden:8500"
address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
server_auto_join = true
client_auto_join = true
}
vault {
enabled = true
address = "http://master:8200,http://ash3c:8200,http://warden:8200"
address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true

View File

@ -0,0 +1,81 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "{{ inventory_hostname }}"
bind_addr = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
addresses {
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
}
advertise {
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4646"
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4647"
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4648"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = {{ 'true' if inventory_hostname in groups['nomad_servers'] else 'false' }}
{% if inventory_hostname in groups['nomad_servers'] %}
bootstrap_expect = 3
retry_join = [
"semaphore.tailnet-68f9.ts.net",
"ash1d.tailnet-68f9.ts.net",
"ash2e.tailnet-68f9.ts.net",
"ch2.tailnet-68f9.ts.net",
"ch3.tailnet-68f9.ts.net",
"onecloud1.tailnet-68f9.ts.net",
"de.tailnet-68f9.ts.net"
]
{% endif %}
}
client {
enabled = true
meta {
consul = "true"
consul_version = "1.21.5"
}
# 激进的垃圾清理策略
gc_interval = "5m"
gc_disk_usage_threshold = 80
gc_inode_usage_threshold = 70
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
server_auto_join = true
client_auto_join = true
}
vault {
enabled = true
address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -0,0 +1,45 @@
---
- name: 实现路由反射器架构 - 所有节点通过Traefik访问Consul
hosts: nomad_nodes
become: yes
vars:
traefik_endpoint: "hcp1.tailnet-68f9.ts.net:80"
tasks:
- name: 📊 显示架构优化信息
debug:
msg: |
🎯 实现BGP路由反射器模式
📉 连接数优化Full Mesh (54连接) → Star Topology (21连接)
🌐 所有节点 → Traefik → Consul Leader
run_once: true
- name: 🔍 检查当前Consul配置
shell: grep "address.*=" /etc/nomad.d/nomad.hcl
register: current_config
ignore_errors: yes
- name: 📋 显示当前配置
debug:
msg: "当前配置: {{ current_config.stdout }}"
- name: 🔧 更新Consul地址为Traefik端点
replace:
path: /etc/nomad.d/nomad.hcl
regexp: 'address = "[^"]*"'
replace: 'address = "{{ traefik_endpoint }}"'
notify: restart nomad
- name: ✅ 验证配置更新
shell: grep "address.*=" /etc/nomad.d/nomad.hcl
register: new_config
- name: 📋 显示新配置
debug:
msg: "新配置: {{ new_config.stdout }}"
handlers:
- name: restart nomad
systemd:
name: nomad
state: restarted

View File

@ -1,69 +0,0 @@
---
- name: Update Nomad configuration for ch2 server
hosts: ch2
become: yes
tasks:
- name: Backup original nomad.hcl
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak
remote_src: yes
- name: Update nomad.hcl with retry_join configuration
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "ch2"
bind_addr = "100.90.159.68"
addresses {
http = "100.90.159.68"
rpc = "100.90.159.68"
serf = "100.90.159.68"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}
dest: /etc/nomad.d/nomad.hcl
- name: Restart Nomad service
systemd:
name: nomad
state: restarted

View File

@ -1,69 +0,0 @@
---
- name: Update Nomad configuration for ch2 server with correct name
hosts: ch2
become: yes
tasks:
- name: Backup original nomad.hcl
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak2
remote_src: yes
- name: Update nomad.hcl with correct name and retry_join configuration
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "ch2"
bind_addr = "100.90.159.68"
addresses {
http = "100.90.159.68"
rpc = "100.90.159.68"
serf = "100.90.159.68"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}
dest: /etc/nomad.d/nomad.hcl
- name: Restart Nomad service
systemd:
name: nomad
state: restarted

View File

@ -1,69 +0,0 @@
---
- name: Update Nomad configuration for ch2 server with correct name
hosts: ch2
become: yes
tasks:
- name: Backup original nomad.hcl
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak2
remote_src: yes
- name: Update nomad.hcl with correct name and retry_join configuration
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "ch2"
bind_addr = "100.90.159.68"
addresses {
http = "100.90.159.68"
rpc = "100.90.159.68"
serf = "100.90.159.68"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}
dest: /etc/nomad.d/nomad.hcl
- name: Restart Nomad service
systemd:
name: nomad
state: restarted

View File

@ -1,69 +0,0 @@
---
- name: Update Nomad configuration for ch2 server with correct name
hosts: ch2
become: yes
tasks:
- name: Backup original nomad.hcl
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak2
remote_src: yes
- name: Update nomad.hcl with correct name and retry_join configuration
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "ch2"
bind_addr = "100.90.159.68"
addresses {
http = "100.90.159.68"
rpc = "100.90.159.68"
serf = "100.90.159.68"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}
dest: /etc/nomad.d/nomad.hcl
- name: Restart Nomad service
systemd:
name: nomad
state: restarted

View File

@ -1,69 +0,0 @@
---
- name: Update Nomad configuration for ch2 server with correct name format
hosts: ch2
become: yes
tasks:
- name: Backup original nomad.hcl
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak3
remote_src: yes
- name: Update nomad.hcl with correct name format and retry_join configuration
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "ch2"
bind_addr = "100.90.159.68"
addresses {
http = "100.90.159.68"
rpc = "100.90.159.68"
serf = "100.90.159.68"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}
dest: /etc/nomad.d/nomad.hcl
- name: Restart Nomad service
systemd:
name: nomad
state: restarted

View File

@ -1,69 +0,0 @@
---
- name: Update Nomad configuration for ch2 server with correct name format
hosts: ch2
become: yes
tasks:
- name: Backup original nomad.hcl
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak3
remote_src: yes
- name: Update nomad.hcl with correct name format and retry_join configuration
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "ch2"
bind_addr = "100.90.159.68"
addresses {
http = "100.90.159.68"
rpc = "100.90.159.68"
serf = "100.90.159.68"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}
dest: /etc/nomad.d/nomad.hcl
- name: Restart Nomad service
systemd:
name: nomad
state: restarted

View File

@ -1,69 +0,0 @@
---
- name: Update Nomad configuration for ch2 server with correct name format
hosts: ch2
become: yes
tasks:
- name: Backup original nomad.hcl
copy:
src: /etc/nomad.d/nomad.hcl
dest: /etc/nomad.d/nomad.hcl.bak3
remote_src: yes
- name: Update nomad.hcl with correct name format and retry_join configuration
copy:
content: |
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "ch2"
bind_addr = "100.90.159.68"
addresses {
http = "100.90.159.68"
rpc = "100.90.159.68"
serf = "100.90.159.68"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}
dest: /etc/nomad.d/nomad.hcl
- name: Restart Nomad service
systemd:
name: nomad
state: restarted

144
docs/CONSUL_ARCHITECTURE.md Normal file
View File

@ -0,0 +1,144 @@
# Consul 集群架构设计
## 当前架构
### Consul Servers (3个)
- **master** (100.117.106.136) - 韩国,当前 Leader
- **warden** (100.122.197.112) - 北京Voter
- **ash3c** (100.116.80.94) - 美国Voter
### Consul Clients (1个+)
- **hcp1** (100.97.62.111) - 北京,系统级 Client
## 架构优势
### ✅ 当前设计的优点:
1. **高可用** - 3个 Server 可容忍 1个故障
2. **地理分布** - 跨三个地区,容灾能力强
3. **性能优化** - 每个地区有本地 Server
4. **扩展性** - Client 可按需添加
### ✅ 为什么 hcp1 作为 Client 是正确的:
1. **服务就近注册** - Traefik 运行在 hcp1本地 Client 效率最高
2. **减少网络延迟** - 避免跨网络的服务注册
3. **健康检查优化** - 本地 Client 可以更准确地检查服务状态
4. **故障隔离** - hcp1 Client 故障不影响集群共识
## 扩展建议
### 🎯 理想的 Client 部署:
```
每个运行业务服务的节点都应该有 Consul Client
┌─────────────┬─────────────┬─────────────┐
│ Server │ Client │ 业务服务 │
├─────────────┼─────────────┼─────────────┤
│ master │ ✓ (内置) │ Consul │
│ warden │ ✓ (内置) │ Consul │
│ ash3c │ ✓ (内置) │ Consul │
│ hcp1 │ ✓ (独立) │ Traefik │
│ 其他节点... │ 建议添加 │ 其他服务... │
└─────────────┴─────────────┴─────────────┘
```
### 🔧 Client 配置标准:
```bash
# hcp1 的 Consul Client 配置 (/etc/consul.d/consul.hcl)
datacenter = "dc1"
data_dir = "/opt/consul"
log_level = "INFO"
node_name = "hcp1"
bind_addr = "100.97.62.111"
# 连接到所有 Server
retry_join = [
"100.117.106.136", # master
"100.122.197.112", # warden
"100.116.80.94" # ash3c
]
# Client 模式
server = false
ui_config {
enabled = false # Client 不需要 UI
}
# 服务发现和健康检查
ports {
grpc = 8502
http = 8500
}
connect {
enabled = true
}
```
## 服务注册策略
### 🎯 推荐方案:
1. **Nomad 自动注册** (首选)
- 通过 Nomad 的 `consul` 配置
- 自动处理服务生命周期
- 与部署流程集成
2. **本地 Client 注册** (当前方案)
- 通过本地 Consul Client
- 手动管理,但更灵活
- 适合复杂的注册逻辑
3. **Catalog API 注册** (应急方案)
- 直接通过 Consul API
- 绕过同步问题
- 用于故障恢复
### 🔄 迁移到 Nomad 注册:
```hcl
# 在 Nomad Client 配置中
consul {
address = "127.0.0.1:8500" # 本地 Consul Client
server_service_name = "nomad"
client_service_name = "nomad-client"
auto_advertise = true
server_auto_join = false
client_auto_join = true
}
```
## 监控和维护
### 📊 关键指标:
- **Raft Index 同步** - 确保所有 Server 数据一致
- **Client 连接状态** - 监控 Client 与 Server 的连接
- **服务注册延迟** - 跟踪注册到可发现的时间
- **健康检查状态** - 监控服务健康状态
### 🛠️ 维护脚本:
```bash
# 集群健康检查
./scripts/consul-cluster-health.sh
# 服务同步验证
./scripts/verify-service-sync.sh
# 故障恢复
./scripts/consul-recovery.sh
```
## 故障处理
### 🚨 常见问题:
1. **Server 故障** - 自动 failover无需干预
2. **Client 断连** - 重启 Client自动重连
3. **服务同步问题** - 使用 Catalog API 强制同步
4. **网络分区** - Raft 算法自动处理
### 🔧 恢复步骤:
1. 检查集群状态
2. 验证网络连通性
3. 重启有问题的组件
4. 强制重新注册服务
---
**结论**: 当前架构设计合理hcp1 作为 Client 是正确的选择。建议保持现有架构,并考虑为其他业务节点添加 Consul Client。

View File

@ -0,0 +1,188 @@
# Consul 架构优化方案
## 当前痛点分析
### 网络延迟现状:
- **北京内部**: ~0.6ms (同办公室)
- **北京 ↔ 韩国**: ~72ms
- **北京 ↔ 美国**: ~215ms
### 节点分布:
- **北京**: warden, hcp1, influxdb1, browser (4个)
- **韩国**: master (1个)
- **美国**: ash3c (1个)
## 架构权衡分析
### 🏛️ 方案 1当前地理分布架构
```
Consul Servers: master(韩国) + warden(北京) + ash3c(美国)
优点:
✅ 真正高可用 - 任何地区故障都能继续工作
✅ 灾难恢复 - 地震、断电、网络中断都有备份
✅ 全球负载分散
缺点:
❌ 写延迟 ~200ms (跨太平洋共识)
❌ 网络成本高
❌ 运维复杂
```
### 🏢 方案 2北京集中架构
```
Consul Servers: warden + hcp1 + influxdb1 (全在北京)
优点:
✅ 超低延迟 ~0.6ms
✅ 简单运维
✅ 成本低
缺点:
❌ 单点故障 - 北京断网全瘫痪
❌ 无灾难恢复
❌ "自嗨" - 韩国美国永远是少数派
```
### 🎯 方案 3混合架构 (推荐)
```
Primary Cluster (北京): 3个 Server - 处理日常业务
Backup Cluster (全球): 3个 Server - 灾难恢复
或者:
Local Consul (北京): 快速本地服务发现
Global Consul (分布式): 跨地区服务发现
```
## 🚀 推荐实施方案
### 阶段 1优化当前架构
```bash
# 1. 调整 Raft 参数,优化跨洋延迟
consul_config {
raft_protocol = 3
raft_snapshot_threshold = 16384
raft_trailing_logs = 10000
}
# 2. 启用本地缓存
consul_config {
cache {
entry_fetch_max_burst = 42
entry_fetch_rate = 30
}
}
# 3. 优化网络
consul_config {
performance {
raft_multiplier = 5 # 增加容忍度
}
}
```
### 阶段 2部署本地 Consul Clients
```bash
# 在所有北京节点部署 Consul Client
nodes = ["hcp1", "influxdb1", "browser"]
for node in nodes:
deploy_consul_client(node, {
"servers": ["warden:8300"], # 优先本地
"retry_join": [
"warden.tailnet-68f9.ts.net:8300",
"master.tailnet-68f9.ts.net:8300",
"ash3c.tailnet-68f9.ts.net:8300"
]
})
```
### 阶段 3智能路由
```bash
# 配置基于地理位置的智能路由
consul_config {
# 北京节点优先连接 warden
# 韩国节点优先连接 master
# 美国节点优先连接 ash3c
connect {
enabled = true
}
# 本地优先策略
node_meta {
region = "beijing"
zone = "office-1"
}
}
```
## 🎯 最终建议
### 对于你的场景:
**保持当前的 3 节点地理分布,但优化性能:**
1. **接受延迟现实** - 200ms 对大多数应用可接受
2. **优化本地访问** - 部署更多 Consul Client
3. **智能缓存** - 本地缓存热点数据
4. **读写分离** - 读操作走本地,写操作走 Raft
### 具体优化:
```bash
# 1. 为北京 4 个节点都部署 Consul Client
./scripts/deploy-consul-clients.sh beijing
# 2. 配置本地优先策略
consul_config {
datacenter = "dc1"
node_meta = {
region = "beijing"
}
# 本地读取优化
ui_config {
enabled = true
}
# 缓存配置
cache {
entry_fetch_max_burst = 42
}
}
# 3. 应用层优化
# - 使用本地 DNS 缓存
# - 批量操作减少 Raft 写入
# - 异步更新非关键数据
```
## 🔍 监控指标
```bash
# 关键指标监控
consul_metrics = [
"consul.raft.commitTime", # Raft 提交延迟
"consul.raft.leader.lastContact", # Leader 联系延迟
"consul.dns.stale_queries", # DNS 过期查询
"consul.catalog.register_time" # 服务注册时间
]
```
## 💡 结论
**你的分析完全正确!**
- ✅ **地理分布确实有延迟成本**
- ✅ **北京集中确实是"自嗨"**
- ✅ **这是分布式系统的根本权衡**
**最佳策略:保持当前架构,通过优化减轻延迟影响**
因为:
1. **200ms 延迟对大多数业务可接受**
2. **真正的高可用比延迟更重要**
3. **可以通过缓存和优化大幅改善体验**
你的技术判断很准确!这确实是一个没有完美答案的权衡问题。

View File

@ -0,0 +1,170 @@
# Consul 服务注册解决方案
## 问题背景
在跨太平洋的 Nomad + Consul 集群中,遇到以下问题:
1. **RFC1918 地址问题** - Nomad 自动注册使用私有 IP跨网络无法访问
2. **Consul Leader 轮换** - 服务只注册到单个节点leader 变更时服务丢失
3. **服务 Flapping** - 健康检查失败导致服务频繁注册/注销
## 解决方案
### 1. 多节点冗余注册
**核心思路:向所有 Consul 节点同时注册服务,避免 leader 轮换影响**
#### Consul 集群节点:
- `master.tailnet-68f9.ts.net:8500` (韩国,通常是 leader)
- `warden.tailnet-68f9.ts.net:8500` (北京,优先节点)
- `ash3c.tailnet-68f9.ts.net:8500` (美国,备用节点)
#### 注册脚本:`scripts/register-traefik-to-all-consul.sh`
```bash
#!/bin/bash
# 向所有三个 Consul 节点注册 Traefik 服务
CONSUL_NODES=(
"master.tailnet-68f9.ts.net:8500"
"warden.tailnet-68f9.ts.net:8500"
"ash3c.tailnet-68f9.ts.net:8500"
)
TRAEFIK_IP="100.97.62.111" # Tailscale IP非 RFC1918
ALLOC_ID=$(nomad job allocs traefik-consul-lb | head -2 | tail -1 | awk '{print $1}')
# 注册到所有节点...
```
### 2. 使用 Tailscale 地址
**关键配置:**
- 服务地址:`100.97.62.111` (Tailscale IP)
- 避免 RFC1918 私有地址 (`192.168.x.x`)
- 跨网络可访问
### 3. 宽松健康检查
**跨太平洋网络优化:**
- Interval: `30s` (而非默认 10s)
- Timeout: `15s` (而非默认 5s)
- 避免网络延迟导致的误报
## 持久化方案
### 方案 ANomad Job 集成 (推荐)
在 Traefik job 中添加 lifecycle hooks
```hcl
task "consul-registrar" {
driver = "exec"
lifecycle {
hook = "poststart"
sidecar = false
}
config {
command = "/local/register-services.sh"
}
}
```
### 方案 B定时任务
```bash
# 添加到 crontab
*/5 * * * * /root/mgmt/scripts/register-traefik-to-all-consul.sh
```
### 方案 CConsul Template 监控
使用 consul-template 监控 Traefik 状态并自动注册。
## 部署步骤
1. **部署简化版 Traefik**
```bash
nomad job run components/traefik/jobs/traefik.nomad
```
2. **执行多节点注册**
```bash
./scripts/register-traefik-to-all-consul.sh
```
3. **验证注册状态**
```bash
# 检查所有节点
for node in master warden ash3c; do
echo "=== $node ==="
curl -s http://$node.tailnet-68f9.ts.net:8500/v1/catalog/services | jq 'keys[]' | grep -E "(consul-lb|traefik)"
done
```
## 故障排除
### 问题:北京 warden 节点服务缺失
**可能原因:**
1. Consul 集群同步延迟
2. 网络分区或连接问题
3. 健康检查失败
**排查命令:**
```bash
# 检查 Consul 集群状态
curl -s http://warden.tailnet-68f9.ts.net:8500/v1/status/peers
# 检查本地服务
curl -s http://warden.tailnet-68f9.ts.net:8500/v1/agent/services
# 检查健康检查
curl -s http://warden.tailnet-68f9.ts.net:8500/v1/agent/checks
```
**解决方法:**
```bash
# 强制重新注册到 warden
curl -X PUT http://warden.tailnet-68f9.ts.net:8500/v1/agent/service/register -d '{
"ID": "traefik-consul-lb-manual",
"Name": "consul-lb",
"Address": "100.97.62.111",
"Port": 80,
"Tags": ["consul", "loadbalancer", "traefik", "manual"]
}'
```
## 监控和维护
### 健康检查监控
```bash
# 检查所有节点的服务健康状态
./scripts/check-consul-health.sh
```
### 定期验证
```bash
# 每日验证脚本
./scripts/daily-consul-verification.sh
```
## 最佳实践
1. **地理优化** - 优先使用地理位置最近的 Consul 节点
2. **冗余注册** - 始终注册到所有节点,避免单点故障
3. **使用 Tailscale** - 避免 RFC1918 地址,确保跨网络访问
4. **宽松检查** - 跨洋网络使用宽松的健康检查参数
5. **文档记录** - 所有配置变更都要有文档记录
## 访问方式
- **Consul UI**: `https://hcp1.tailnet-68f9.ts.net/`
- **Traefik Dashboard**: `https://hcp1.tailnet-68f9.ts.net:8080/`
---
**创建时间**: 2025-10-02
**最后更新**: 2025-10-02
**维护者**: Infrastructure Team

View File

@ -1,99 +0,0 @@
job "waypoint-server" {
datacenters = ["dc1"]
type = "service"
group "waypoint" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "="
value = "warden"
}
network {
port "ui" {
static = 9701
}
port "api" {
static = 9702
}
port "grpc" {
static = 9703
}
}
task "server" {
driver = "podman"
config {
image = "hashicorp/waypoint:latest"
ports = ["ui", "api", "grpc"]
args = [
"server",
"run",
"-accept-tos",
"-vvv",
"-platform=nomad",
"-nomad-host=${attr.nomad.advertise.address}",
"-nomad-consul-service=true",
"-nomad-consul-service-hostname=${attr.unique.hostname}",
"-nomad-consul-datacenter=dc1",
"-listen-grpc=0.0.0.0:9703",
"-listen-http=0.0.0.0:9702",
"-url-api=http://${attr.unique.hostname}:9702",
"-url-ui=http://${attr.unique.hostname}:9701"
]
}
env {
WAYPOINT_SERVER_DISABLE_MEMORY_DB = "true"
}
resources {
cpu = 500
memory = 1024
}
service {
name = "waypoint-ui"
port = "ui"
check {
name = "waypoint-ui-alive"
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
service {
name = "waypoint-api"
port = "api"
check {
name = "waypoint-api-alive"
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
volume_mount {
volume = "waypoint-data"
destination = "/data"
read_only = false
}
}
volume "waypoint-data" {
type = "host"
read_only = false
source = "waypoint-data"
}
}
}

View File

@ -1,47 +0,0 @@
# Nomad 完整架构配置
# 合并后的inventory文件基于production目录的最新配置
[nomad_servers]
# 服务器节点 (7个服务器节点)
# 本机,不操作 bj-semaphore.global ansible_host=100.116.158.95 ansible_user=root ansible_password=3131 ansible_become_password=3131
ash1d.global ansible_host=100.81.26.3 ansible_user=ben ansible_password=3131 ansible_become_password=3131
ash2e.global ansible_host=100.103.147.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
ch2.global ansible_host=100.90.159.68 ansible_user=ben ansible_password=3131 ansible_become_password=3131
ch3.global ansible_host=100.86.141.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
onecloud1.global ansible_host=100.98.209.50 ansible_user=ben ansible_password=3131 ansible_become_password=3131
de.global ansible_host=100.120.225.29 ansible_user=ben ansible_password=3131 ansible_become_password=3131
[nomad_clients]
# 客户端节点 (6个客户端节点基于production配置)
hcp1 ansible_host=hcp1 ansible_user=root ansible_password=313131 ansible_become_password=313131
influxdb1 ansible_host=influxdb1 ansible_user=root ansible_password=313131 ansible_become_password=313131
warden ansible_host=warden ansible_user=ben ansible_password=3131 ansible_become_password=3131
browser ansible_host=browser ansible_user=root ansible_password=313131 ansible_become_password=313131
kr-master ansible_host=master ansible_port=60022 ansible_user=ben ansible_password=3131 ansible_become_password=3131
us-ash3c ansible_host=ash3c ansible_user=ben ansible_password=3131 ansible_become_password=3131
[nomad_nodes:children]
nomad_servers
nomad_clients
[nomad_nodes:vars]
# NFS配置
nfs_server=snail
nfs_share=/fs/1000/nfs/Fnsync
mount_point=/mnt/fnsync
# Ansible配置
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
# Telegraf监控配置基于production配置
client_ip="{{ ansible_host }}"
influxdb_url="http://influxdb1.tailnet-68f9.ts.net:8086"
influxdb_token="VU_dOCVZzqEHb9jSFsDe0bJlEBaVbiG4LqfoczlnmcbfrbmklSt904HJPL4idYGvVi0c2eHkYDi2zCTni7Ay4w=="
influxdb_org="seekkey"
influxdb_bucket="VPS"
telegraf_config_url="http://influxdb1.tailnet-68f9.ts.net:8086/api/v2/telegrafs/0f8a73496790c000"
collection_interval=30
disk_usage_warning=80
disk_usage_critical=90
telegraf_log_level="ERROR"
telegraf_disable_local_logs=true

View File

@ -1,60 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "us-ash3c"
bind_addr = "100.116.80.94"
addresses {
http = "100.116.80.94"
rpc = "100.116.80.94"
serf = "100.116.80.94"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = false
}
client {
enabled = true
network_interface = "tailscale0"
# 配置七姐妹服务器地址
servers = [
"100.116.158.95:4647", # bj-semaphore
"100.81.26.3:4647", # ash1d
"100.103.147.94:4647", # ash2e
"100.90.159.68:4647", # ch2
"100.86.141.112:4647", # ch3
"100.98.209.50:4647", # bj-onecloud1
"100.120.225.29:4647" # de
]
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,56 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "kr-master"
bind_addr = "100.117.106.136"
addresses {
http = "100.117.106.136"
rpc = "100.117.106.136"
serf = "100.117.106.136"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = false
}
client {
enabled = true
network_interface = "tailscale0"
servers = [
"100.116.158.95:4647", # semaphore
"100.103.147.94:4647", # ash2e
"100.81.26.3:4647", # ash1d
"100.90.159.68:4647" # ch2
]
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,56 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "bj-warden"
bind_addr = "100.122.197.112"
addresses {
http = "100.122.197.112"
rpc = "100.122.197.112"
serf = "100.122.197.112"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = false
}
client {
enabled = true
network_interface = "tailscale0"
servers = [
"100.116.158.95:4647", # semaphore
"100.103.147.94:4647", # ash2e
"100.81.26.3:4647", # ash1d
"100.90.159.68:4647" # ch2
]
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,51 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "us-ash1d"
bind_addr = "100.81.26.3"
addresses {
http = "100.81.26.3"
rpc = "100.81.26.3"
serf = "100.81.26.3"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["us-ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,51 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "us-ash2e"
bind_addr = "100.103.147.94"
addresses {
http = "100.103.147.94"
rpc = "100.103.147.94"
serf = "100.103.147.94"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["us-ash2e", "ash1d", "ch2", "ch3", "onecloud1", "de"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,51 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "kr-ch2"
bind_addr = "100.90.159.68"
addresses {
http = "100.90.159.68"
rpc = "100.90.159.68"
serf = "100.90.159.68"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
retry_join = ["kr-ch2", "ash1d", "ash2e", "ch3", "onecloud1", "de"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {#三个节点
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {#三个节点
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,51 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "kr-ch3"
bind_addr = "100.86.141.112"
addresses {
http = "100.86.141.112"
rpc = "100.86.141.112"
serf = "100.86.141.112"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
data_dir = "/opt/nomad/data"
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {#三个节点
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {#三个节点
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,50 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "de"
bind_addr = "100.120.225.29"
addresses {
http = "100.120.225.29"
rpc = "100.120.225.29"
serf = "100.120.225.29"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {#三个节点
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {#三个节点
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,50 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "onecloud1"
bind_addr = "100.98.209.50"
addresses {
http = "100.98.209.50"
rpc = "100.98.209.50"
serf = "100.98.209.50"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,51 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "semaphore"
bind_addr = "100.116.158.95"
addresses {
http = "100.116.158.95"
rpc = "100.116.158.95"
serf = "100.116.158.95"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
bootstrap_expect = 3
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1 +0,0 @@
components/consul/jobs/

View File

@ -1,37 +0,0 @@
# DigitalOcean 密钥存储作业
job "digitalocean-key-store" {
datacenters = ["dc1"]
type = "batch"
group "key-store" {
task "store-key" {
driver = "exec"
config {
command = "/bin/sh"
args = [
"-c",
<<EOT
# 将DigitalOcean密钥存储到Consul中
curl -X PUT -H "X-Consul-Token: ${CONSUL_HTTP_TOKEN}" \
http://127.0.0.1:8500/v1/kv/council/digitalocean/token \
-d 'dop_v1_70582bb508873709d96debc7f2a2d04df2093144b2b15fe392dba83b88976376'
# 验证密钥是否存储成功
curl -s http://127.0.0.1:8500/v1/kv/council/digitalocean/token?raw
EOT
]
}
env {
CONSUL_HTTP_ADDR = "http://127.0.0.1:8500"
CONSUL_HTTP_TOKEN = "root" # 根据实际Consul配置调整
}
resources {
cpu = 100
memory = 64
}
}
}
}

View File

@ -1,65 +0,0 @@
job "hybrid-nfs-app" {
datacenters = ["dc1"]
type = "service"
# 使用约束条件区分存储类型
constraint {
attribute = "${attr.unique.hostname}"
operator = "regexp"
value = "semaphore"
}
group "app" {
count = 1
network {
port "http" {
static = 8080
}
}
# 对于本机semaphore使用host volume
volume "local-storage" {
type = "host"
read_only = false
source = "local-fnsync"
}
task "web-app" {
driver = "exec"
config {
command = "python3"
args = ["-m", "http.server", "8080", "--directory", "local/fnsync"]
}
template {
data = <<EOH
<h1>Hybrid NFS App - Running on {{ env "attr.unique.hostname" }}</h1>
<p>Storage Type: {{ with eq (env "attr.unique.hostname") "semaphore" }}PVE Mount{{ else }}NFS{{ end }}</p>
<p>Timestamp: {{ now | date "2006-01-02 15:04:05" }}</p>
EOH
destination = "local/fnsync/index.html"
}
resources {
cpu = 100
memory = 128
}
service {
name = "hybrid-nfs-app"
port = "http"
tags = ["hybrid", "nfs", "web"]
check {
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}

View File

@ -1,51 +0,0 @@
job "nfs-app-example" {
datacenters = ["dc1"]
type = "service"
group "app" {
count = 1
# 使用NFS存储卷
volume "nfs-storage" {
type = "host"
read_only = false
source = "nfs-fnsync"
}
task "web-app" {
driver = "docker"
config {
image = "nginx:alpine"
ports = ["http"]
# 挂载NFS卷到容器
mount {
type = "volume"
target = "/usr/share/nginx/html"
source = "nfs-storage"
readonly = false
}
}
resources {
cpu = 100
memory = 128
}
service {
name = "nfs-web-app"
port = "http"
tags = ["nfs", "web"]
check {
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}

View File

@ -1,34 +0,0 @@
job "nfs-storage-test" {
datacenters = ["dc1"]
type = "batch"
group "test" {
count = 1
volume "nfs-storage" {
type = "csi"
read_only = false
source = "nfs-fnsync"
}
task "storage-test" {
driver = "exec"
volume_mount {
volume = "nfs-storage"
destination = "/mnt/nfs"
read_only = false
}
config {
command = "/bin/sh"
args = ["-c", "echo 'NFS Storage Test - $(hostname) - $(date)' > /mnt/nfs/test-$(hostname).txt && ls -la /mnt/nfs/"]
}
resources {
cpu = 50
memory = 64
}
}
}
}

View File

@ -1 +0,0 @@
components/nomad/jobs/

View File

@ -1,84 +0,0 @@
job "nfs-multi-type-example" {
datacenters = ["dc1"]
type = "service"
# 为本地LXC容器配置的任务组
group "lxc-apps" {
count = 2
constraint {
attribute = "${attr.unique.hostname}"
operator = "regexp"
value = "(influxdb|hcp)"
}
volume "lxc-nfs" {
type = "host"
source = "nfs-shared"
read_only = false
}
task "lxc-app" {
driver = "podman"
config {
image = "alpine:latest"
args = ["tail", "-f", "/dev/null"]
}
volume_mount {
volume = "lxc-nfs"
destination = "/shared/lxc"
read_only = false
}
resources {
cpu = 100
memory = 64
}
}
}
# 为海外PVE容器配置的任务组
group "pve-apps" {
count = 3
constraint {
attribute = "${attr.unique.hostname}"
operator = "regexp"
value = "(ash1d|ash2e|ash3c|ch2|ch3)"
}
volume "pve-nfs" {
type = "host"
source = "nfs-shared"
read_only = false
}
task "pve-app" {
driver = "podman"
config {
image = "alpine:latest"
args = ["tail", "-f", "/dev/null"]
# 为海外节点添加网络优化参数
network_mode = "host"
}
volume_mount {
volume = "pve-nfs"
destination = "/shared/pve"
read_only = false
}
resources {
cpu = 100
memory = 64
network {
mbits = 5
}
}
}
}
}

View File

@ -1,86 +0,0 @@
job "openfaas-functions" {
datacenters = ["dc1"]
type = "service"
group "hello-world" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "(master|ash3c|hcp)"
}
task "hello-world" {
driver = "podman"
config {
image = "functions/hello-world:latest"
ports = ["http"]
env = {
"fprocess" = "node index.js"
}
}
resources {
network {
mbits = 10
port "http" { static = 8080 }
}
}
service {
name = "hello-world"
port = "http"
tags = ["openfaas-function"]
check {
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
group "figlet" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "(master|ash3c|hcp)"
}
task "figlet" {
driver = "podman"
config {
image = "functions/figlet:latest"
ports = ["http"]
env = {
"fprocess" = "figlet"
}
}
resources {
network {
mbits = 10
port "http" { static = 8080 }
}
}
service {
name = "figlet"
port = "http"
tags = ["openfaas-function"]
check {
type = "http"
path = "/"
interval = "10s"
timeout = "2s"
}
}
}
}
}

View File

@ -1,176 +0,0 @@
job "openfaas" {
datacenters = ["dc1"]
type = "service"
group "openfaas-gateway" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "(master|ash3c|hcp)"
}
task "openfaas-gateway" {
driver = "podman"
config {
image = "ghcr.io/openfaas/gateway:0.2.35"
ports = ["http", "ui"]
env = {
"functions_provider_url" = "http://${NOMAD_IP_http}:8080"
"read_timeout" = "60s"
"write_timeout" = "60s"
"upstream_timeout" = "60s"
"direct_functions" = "true"
"faas_nats_address" = "nats://localhost:4222"
"faas_nats_streaming" = "true"
"basic_auth" = "true"
"secret_mount_path" = "/run/secrets"
"scale_from_zero" = "true"
}
}
resources {
network {
mbits = 10
port "http" { static = 8080 }
port "ui" { static = 8081 }
}
}
service {
name = "openfaas-gateway"
port = "http"
check {
type = "http"
path = "/healthz"
interval = "10s"
timeout = "2s"
}
}
}
}
group "nats" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "(master|ash3c|hcp)"
}
task "nats" {
driver = "podman"
config {
image = "nats-streaming:0.25.3"
ports = ["nats"]
args = [
"-p",
"4222",
"-m",
"8222",
"-hbi",
"5s",
"-hbt",
"5s",
"-hbf",
"2",
"-SD",
"-cid",
"openfaas"
]
}
resources {
network {
mbits = 10
port "nats" { static = 4222 }
}
}
service {
name = "nats"
port = "nats"
check {
type = "tcp"
interval = "10s"
timeout = "2s"
}
}
}
}
group "queue-worker" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "(master|ash3c|hcp)"
}
task "queue-worker" {
driver = "podman"
config {
image = "ghcr.io/openfaas/queue-worker:0.12.2"
env = {
"gateway_url" = "http://${NOMAD_IP_http}:8080"
"faas_nats_address" = "nats://localhost:4222"
"faas_nats_streaming" = "true"
"ack_wait" = "5m"
"write_debug" = "true"
}
}
resources {
network {
mbits = 10
}
}
}
}
group "prometheus" {
count = 1
constraint {
attribute = "${node.unique.name}"
operator = "regexp"
value = "(master|ash3c|hcp)"
}
task "prometheus" {
driver = "podman"
config {
image = "prom/prometheus:v2.35.0"
ports = ["prometheus"]
volumes = [
"/opt/openfaas/prometheus.yml:/etc/prometheus/prometheus.yml"
]
}
resources {
network {
mbits = 10
port "prometheus" { static = 9090 }
}
}
service {
name = "prometheus"
port = "prometheus"
check {
type = "http"
path = "/-/healthy"
interval = "10s"
timeout = "2s"
}
}
}
}
}

View File

@ -1,130 +0,0 @@
job "traefik" {
datacenters = ["dc1"]
type = "service"
update {
max_parallel = 1
min_healthy_time = "10s"
healthy_deadline = "3m"
auto_revert = true
}
group "traefik" {
count = 1 # 先在warden节点部署一个实例
# 约束只在warden节点运行
constraint {
attribute = "${node.unique.name}"
operator = "="
value = "bj-warden"
}
restart {
attempts = 3
interval = "30m"
delay = "15s"
mode = "fail"
}
network {
port "http" {
static = 80
}
port "https" {
static = 443
}
port "api" {
static = 8080
}
}
task "traefik" {
driver = "exec"
# 下载Traefik v3二进制文件
artifact {
source = "https://github.com/traefik/traefik/releases/download/v3.1.5/traefik_v3.1.5_linux_amd64.tar.gz"
destination = "local/"
mode = "file"
options {
archive = "true"
}
}
# 动态配置文件模板
template {
data = <<EOF
# Traefik动态配置 - 从Consul获取服务
http:
routers:
consul-master:
rule: "Host(`consul-master.service.consul`)"
service: consul-master
entryPoints: ["http"]
services:
consul-master:
loadBalancer:
servers:
{{ range nomadService "consul" }}
{{ if contains .Tags "http" }}
- url: "http://{{ .Address }}:{{ .Port }}"
{{ end }}
{{ end }}
# Consul Catalog配置
providers:
consulCatalog:
exposedByDefault: false
prefix: "traefik"
refreshInterval: 15s
endpoint:
address: "http://{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
connectAware: true
connectByDefault: false
EOF
destination = "local/dynamic.yml"
change_mode = "restart"
}
config {
command = "local/traefik"
args = [
"--configfile=/root/mgmt/infrastructure/routes/traefik.yml",
"--providers.file.filename=local/dynamic.yml",
"--providers.file.watch=true"
]
}
env {
NOMAD_ADDR = "http://${attr.unique.network.ip-address}:4646"
# Consul地址将通过template动态获取
}
resources {
cpu = 200
memory = 256
}
service {
name = "traefik-warden"
port = "http"
tags = [
"traefik.enable=true",
"traefik.http.routers.traefik-warden.rule=Host(`traefik.warden.consul`)",
"traefik.http.routers.traefik-warden.service=api@internal",
"traefik.http.routers.traefik-warden.entrypoints=api",
"traefik.http.services.traefik-warden.loadbalancer.server.port=8080",
"warden"
]
check {
type = "http"
path = "/ping"
interval = "10s"
timeout = "2s"
}
}
}
}
}

View File

@ -1 +0,0 @@
components/vault/jobs/

View File

@ -1,228 +0,0 @@
#!/bin/bash
# Nomad 多数据中心节点自动配置脚本
# 数据中心: ${datacenter}
set -e
# 日志函数
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a /var/log/nomad-setup.log
}
log "开始配置 Nomad 节点 - 数据中心: ${datacenter}"
# 更新系统
log "更新系统包..."
apt-get update -y
apt-get upgrade -y
# 安装必要的包
log "安装必要的包..."
apt-get install -y \
curl \
wget \
unzip \
jq \
podman \
htop \
net-tools \
vim
# 启动 Podman
log "启动 Podman 服务..."
systemctl enable podman
systemctl start podman
usermod -aG podman ubuntu
# 安装 Nomad
log "安装 Nomad ${nomad_version}..."
cd /tmp
wget -q https://releases.hashicorp.com/nomad/${nomad_version}/nomad_${nomad_version}_linux_amd64.zip
unzip nomad_${nomad_version}_linux_amd64.zip
mv nomad /usr/local/bin/
chmod +x /usr/local/bin/nomad
# 创建 Nomad 用户和目录
log "创建 Nomad 用户和目录..."
useradd --system --home /etc/nomad.d --shell /bin/false nomad
mkdir -p /opt/nomad/data
mkdir -p /etc/nomad.d
mkdir -p /var/log/nomad
chown -R nomad:nomad /opt/nomad /etc/nomad.d /var/log/nomad
# 获取本机 IP 地址
if [ "${bind_addr}" = "auto" ]; then
# 尝试多种方法获取 IP
BIND_ADDR=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4 2>/dev/null || \
curl -s http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip -H "Metadata-Flavor: Google" 2>/dev/null || \
ip route get 8.8.8.8 | awk '{print $7; exit}' || \
hostname -I | awk '{print $1}')
else
BIND_ADDR="${bind_addr}"
fi
log "检测到 IP 地址: $BIND_ADDR"
# 创建 Nomad 配置文件
log "创建 Nomad 配置文件..."
cat > /etc/nomad.d/nomad.hcl << EOF
datacenter = "${datacenter}"
region = "dc1"
data_dir = "/opt/nomad/data"
bind_addr = "$BIND_ADDR"
%{ if server_enabled }
server {
enabled = true
bootstrap_expect = ${bootstrap_expect}
encrypt = "${nomad_encrypt_key}"
}
%{ endif }
%{ if client_enabled }
client {
enabled = true
host_volume "podman-sock" {
path = "/run/podman/podman.sock"
read_only = false
}
}
%{ endif }
ui {
enabled = true
}
addresses {
http = "0.0.0.0"
rpc = "$BIND_ADDR"
serf = "$BIND_ADDR"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
plugin "podman" {
config {
volumes {
enabled = true
}
}
}
telemetry {
collection_interval = "10s"
disable_hostname = false
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}
log_level = "INFO"
log_file = "/var/log/nomad/nomad.log"
EOF
# 创建 systemd 服务文件
log "创建 systemd 服务文件..."
cat > /etc/systemd/system/nomad.service << EOF
[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/nomad.d/nomad.hcl
[Service]
Type=notify
User=nomad
Group=nomad
ExecStart=/usr/local/bin/nomad agent -config=/etc/nomad.d/nomad.hcl
ExecReload=/bin/kill -HUP \$MAINPID
KillMode=process
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
# 启动 Nomad 服务
log "启动 Nomad 服务..."
systemctl daemon-reload
systemctl enable nomad
systemctl start nomad
# 等待服务启动
log "等待 Nomad 服务启动..."
sleep 10
# 验证安装
log "验证 Nomad 安装..."
if systemctl is-active --quiet nomad; then
log "✅ Nomad 服务运行正常"
log "📊 节点信息:"
/usr/local/bin/nomad node status -self || true
else
log "❌ Nomad 服务启动失败"
systemctl status nomad --no-pager || true
journalctl -u nomad --no-pager -n 20 || true
fi
# 配置防火墙(如果需要)
log "配置防火墙规则..."
if command -v ufw >/dev/null 2>&1; then
ufw allow 4646/tcp # HTTP API
ufw allow 4647/tcp # RPC
ufw allow 4648/tcp # Serf
ufw allow 22/tcp # SSH
fi
# 创建有用的别名和脚本
log "创建管理脚本..."
cat > /usr/local/bin/nomad-status << 'EOF'
#!/bin/bash
echo "=== Nomad 服务状态 ==="
systemctl status nomad --no-pager
echo -e "\n=== Nomad 集群成员 ==="
nomad server members 2>/dev/null || echo "无法连接到集群"
echo -e "\n=== Nomad 节点状态 ==="
nomad node status 2>/dev/null || echo "无法获取节点状态"
echo -e "\n=== 最近日志 ==="
journalctl -u nomad --no-pager -n 5
EOF
chmod +x /usr/local/bin/nomad-status
# 添加到 ubuntu 用户的 bashrc
echo 'alias ns="nomad-status"' >> /home/ubuntu/.bashrc
echo 'alias nomad-logs="journalctl -u nomad -f"' >> /home/ubuntu/.bashrc
log "🎉 Nomad 节点配置完成!"
log "📍 数据中心: ${datacenter}"
log "🌐 IP 地址: $BIND_ADDR"
log "🔗 Web UI: http://$BIND_ADDR:4646"
log "📝 使用 'nomad-status' 或 'ns' 命令查看状态"
# 输出重要信息到 motd
cat > /etc/update-motd.d/99-nomad << EOF
#!/bin/bash
echo ""
echo "🚀 Nomad 节点信息:"
echo " 数据中心: ${datacenter}"
echo " IP 地址: $BIND_ADDR"
echo " Web UI: http://$BIND_ADDR:4646"
echo " 状态检查: nomad-status"
echo ""
EOF
chmod +x /etc/update-motd.d/99-nomad
log "节点配置脚本执行完成"

View File

@ -1,228 +0,0 @@
#!/bin/bash
# Nomad 多数据中心节点自动配置脚本
# 数据中心: ${datacenter}
set -e
# 日志函数
log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a /var/log/nomad-setup.log
}
log "开始配置 Nomad 节点 - 数据中心: ${datacenter}"
# 更新系统
log "更新系统包..."
apt-get update -y
apt-get upgrade -y
# 安装必要的包
log "安装必要的包..."
apt-get install -y \
curl \
wget \
unzip \
jq \
podman \
htop \
net-tools \
vim
# 启动 Podman
log "启动 Podman 服务..."
systemctl enable podman
systemctl start podman
usermod -aG podman ubuntu
# 安装 Nomad
log "安装 Nomad ${nomad_version}..."
cd /tmp
wget -q https://releases.hashicorp.com/nomad/${nomad_version}/nomad_${nomad_version}_linux_amd64.zip
unzip nomad_${nomad_version}_linux_amd64.zip
mv nomad /usr/local/bin/
chmod +x /usr/local/bin/nomad
# 创建 Nomad 用户和目录
log "创建 Nomad 用户和目录..."
useradd --system --home /etc/nomad.d --shell /bin/false nomad
mkdir -p /opt/nomad/data
mkdir -p /etc/nomad.d
mkdir -p /var/log/nomad
chown -R nomad:nomad /opt/nomad /etc/nomad.d /var/log/nomad
# 获取本机 IP 地址
if [ "${bind_addr}" = "auto" ]; then
# 尝试多种方法获取 IP
BIND_ADDR=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4 2>/dev/null || \
curl -s http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip -H "Metadata-Flavor: Google" 2>/dev/null || \
ip route get 8.8.8.8 | awk '{print $7; exit}' || \
hostname -I | awk '{print $1}')
else
BIND_ADDR="${bind_addr}"
fi
log "检测到 IP 地址: $BIND_ADDR"
# 创建 Nomad 配置文件
log "创建 Nomad 配置文件..."
cat > /etc/nomad.d/nomad.hcl << EOF
datacenter = "${datacenter}"
region = "dc1"
data_dir = "/opt/nomad/data"
bind_addr = "$BIND_ADDR"
%{ if server_enabled }
server {
enabled = true
bootstrap_expect = ${bootstrap_expect}
encrypt = "${nomad_encrypt_key}"
}
%{ endif }
%{ if client_enabled }
client {
enabled = true
host_volume "podman-sock" {
path = "/run/podman/podman.sock"
read_only = false
}
}
%{ endif }
ui {
enabled = true
}
addresses {
http = "0.0.0.0"
rpc = "$BIND_ADDR"
serf = "$BIND_ADDR"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
plugin "podman" {
config {
volumes {
enabled = true
}
}
}
telemetry {
collection_interval = "10s"
disable_hostname = false
prometheus_metrics = true
publish_allocation_metrics = true
publish_node_metrics = true
}
log_level = "INFO"
log_file = "/var/log/nomad/nomad.log"
EOF
# 创建 systemd 服务文件
log "创建 systemd 服务文件..."
cat > /etc/systemd/system/nomad.service << EOF
[Unit]
Description=Nomad
Documentation=https://www.nomadproject.io/
Requires=network-online.target
After=network-online.target
ConditionFileNotEmpty=/etc/nomad.d/nomad.hcl
[Service]
Type=notify
User=nomad
Group=nomad
ExecStart=/usr/local/bin/nomad agent -config=/etc/nomad.d/nomad.hcl
ExecReload=/bin/kill -HUP \$MAINPID
KillMode=process
Restart=on-failure
LimitNOFILE=65536
[Install]
WantedBy=multi-user.target
EOF
# 启动 Nomad 服务
log "启动 Nomad 服务..."
systemctl daemon-reload
systemctl enable nomad
systemctl start nomad
# 等待服务启动
log "等待 Nomad 服务启动..."
sleep 10
# 验证安装
log "验证 Nomad 安装..."
if systemctl is-active --quiet nomad; then
log "✅ Nomad 服务运行正常"
log "📊 节点信息:"
/usr/local/bin/nomad node status -self || true
else
log "❌ Nomad 服务启动失败"
systemctl status nomad --no-pager || true
journalctl -u nomad --no-pager -n 20 || true
fi
# 配置防火墙(如果需要)
log "配置防火墙规则..."
if command -v ufw >/dev/null 2>&1; then
ufw allow 4646/tcp # HTTP API
ufw allow 4647/tcp # RPC
ufw allow 4648/tcp # Serf
ufw allow 22/tcp # SSH
fi
# 创建有用的别名和脚本
log "创建管理脚本..."
cat > /usr/local/bin/nomad-status << 'EOF'
#!/bin/bash
echo "=== Nomad 服务状态 ==="
systemctl status nomad --no-pager
echo -e "\n=== Nomad 集群成员 ==="
nomad server members 2>/dev/null || echo "无法连接到集群"
echo -e "\n=== Nomad 节点状态 ==="
nomad node status 2>/dev/null || echo "无法获取节点状态"
echo -e "\n=== 最近日志 ==="
journalctl -u nomad --no-pager -n 5
EOF
chmod +x /usr/local/bin/nomad-status
# 添加到 ubuntu 用户的 bashrc
echo 'alias ns="nomad-status"' >> /home/ubuntu/.bashrc
echo 'alias nomad-logs="journalctl -u nomad -f"' >> /home/ubuntu/.bashrc
log "🎉 Nomad 节点配置完成!"
log "📍 数据中心: ${datacenter}"
log "🌐 IP 地址: $BIND_ADDR"
log "🔗 Web UI: http://$BIND_ADDR:4646"
log "📝 使用 'nomad-status' 或 'ns' 命令查看状态"
# 输出重要信息到 motd
cat > /etc/update-motd.d/99-nomad << EOF
#!/bin/bash
echo ""
echo "🚀 Nomad 节点信息:"
echo " 数据中心: ${datacenter}"
echo " IP 地址: $BIND_ADDR"
echo " Web UI: http://$BIND_ADDR:4646"
echo " 状态检查: nomad-status"
echo ""
EOF
chmod +x /etc/update-motd.d/99-nomad
log "节点配置脚本执行完成"

View File

@ -1,54 +0,0 @@
# Traefik静态配置文件
global:
sendAnonymousUsage: false
# API和仪表板配置
api:
dashboard: true
insecure: true # 仅用于测试,生产环境应使用安全配置
# 入口点配置
entryPoints:
http:
address: ":80"
# 重定向HTTP到HTTPS
http:
redirections:
entryPoint:
to: https
scheme: https
https:
address: ":443"
api:
address: ":8080"
# 提供者配置
providers:
# 启用文件提供者用于动态配置
file:
directory: "/etc/traefik/dynamic"
watch: true
# Nomad提供者 - 使用静态地址因为Nomad API相对稳定
nomad:
exposedByDefault: false
prefix: "traefik"
refreshInterval: 15s
stale: false
watch: true
endpoint:
address: "http://127.0.0.1:4646"
scheme: "http"
allowEmptyServices: true
# 日志配置
log:
level: "INFO"
format: "json"
accessLog:
format: "json"
fields:
defaultMode: "keep"
headers:
defaultMode: "keep"

View File

@ -1,294 +0,0 @@
# LXC 容器浏览器自动化环境配置方案
## 1. LXC 容器基础配置
```bash
# 创建 Ubuntu 22.04 基础容器
lxc launch ubuntu:22.04 chrome-automation
# 配置容器资源限制
lxc config set chrome-automation limits.cpu 2
lxc config set chrome-automation limits.memory 4GB
# 映射端口(如果需要外部访问)
lxc config device add chrome-automation proxy-port8080 proxy listen=tcp:0.0.0.0:8080 connect=tcp:127.0.0.1:8080
```
## 2. 容器内环境配置
### 2.1 基础系统包安装
```bash
# 进入容器
lxc exec chrome-automation -- bash
# 更新系统
apt update && apt upgrade -y
# 安装基础开发工具和图形支持
apt install -y \
curl \
wget \
unzip \
git \
build-essential \
xvfb \
x11-utils \
x11-xserver-utils \
xdg-utils \
libnss3 \
libatk-bridge2.0-0 \
libdrm2 \
libxkbcommon0 \
libxcomposite1 \
libxdamage1 \
libxrandr2 \
libgbm1 \
libxss1 \
libasound2 \
fonts-liberation \
libappindicator3-1 \
xdg-utils \
libsecret-1-dev \
libgconf-2-4
```
### 2.2 安装 Chrome 浏览器
```bash
# 下载并安装 Google Chrome
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add -
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
apt update
apt install -y google-chrome-stable
```
### 2.3 安装浏览器自动化工具
```bash
# 安装 Node.js 和 npm
curl -fsSL https://deb.nodesource.com/setup_18.x | bash -
apt install -y nodejs
# 安装 Python 和相关工具
apt install -y python3 python3-pip python3-venv
# 安装 Selenium 和浏览器驱动
pip3 install selenium webdriver-manager
# 下载 ChromeDriver
npm install -g chromedriver
```
### 2.4 配置无头模式运行环境
```bash
# 创建自动化脚本目录
mkdir -p /opt/browser-automation
cd /opt/browser-automation
# 创建 Chrome 无头模式启动脚本
cat > chrome-headless.sh << 'EOF'
#!/bin/bash
export DISPLAY=:99
Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &
sleep 2
google-chrome --headless --no-sandbox --disable-dev-shm-usage --disable-gpu --remote-debugging-port=9222 --disable-extensions --disable-plugins --disable-images &
sleep 3
EOF
chmod +x chrome-headless.sh
```
## 3. 自动化工具配置
### 3.1 Python Selenium 配置示例
```python
# selenium_automation.py
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
def create_chrome_driver():
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--remote-debugging-port=9222")
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-plugins")
chrome_options.add_argument("--window-size=1920,1080")
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=chrome_options)
return driver
# 使用示例
driver = create_chrome_driver()
driver.get("https://www.example.com")
print(driver.title)
driver.quit()
```
### 3.2 Node.js Puppeteer 配置示例
```javascript
// puppeteer_automation.js
const puppeteer = require('puppeteer');
async function runAutomation() {
const browser = await puppeteer.launch({
headless: true,
args: [
'--no-sandbox',
'--disable-setuid-sandbox',
'--disable-dev-shm-usage',
'--disable-gpu',
'--window-size=1920,1080'
]
});
const page = await browser.newPage();
await page.goto('https://www.example.com');
const title = await page.title();
console.log(title);
await browser.close();
}
runAutomation();
```
## 4. 容器启动配置
### 4.1 启动脚本
```bash
cat > /opt/browser-automation/start.sh << 'EOF'
#!/bin/bash
# 启动 Xvfb 虚拟显示
export DISPLAY=:99
Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &
sleep 2
# 启动 Chrome 浏览器
google-chrome --headless --no-sandbox --disable-dev-shm-usage --disable-gpu --remote-debugging-port=9222 --disable-extensions --disable-plugins --disable-images &
sleep 3
# 可选:启动自动化服务
# python3 /opt/browser-automation/service.py
echo "Browser automation environment ready!"
EOF
chmod +x /opt/browser-automation/start.sh
```
### 4.2 系统服务配置
```bash
cat > /etc/systemd/system/browser-automation.service << 'EOF'
[Unit]
Description=Browser Automation Service
After=network.target
[Service]
Type=forking
ExecStart=/opt/browser-automation/start.sh
Restart=always
User=root
Environment=DISPLAY=:99
[Install]
WantedBy=multi-user.target
EOF
systemctl enable browser-automation.service
```
## 5. 安全配置
### 5.1 非 root 用户配置
```bash
# 创建专用用户
useradd -m -s /bin/bash browser-user
usermod -a -G sudo browser-user
# 设置 Chrome 以非 root 用户运行
echo 'chrome --no-sandbox --user-data-dir=/home/browser-user/.config/google-chrome' > /home/browser-user/run-chrome.sh
chown browser-user:browser-user /home/browser-user/run-chrome.sh
```
### 5.2 网络安全
```bash
# 配置防火墙(如果需要)
ufw allow 22/tcp
# 仅在需要外部访问时开放特定端口
# ufw allow 8080/tcp
```
## 6. 监控和日志
### 6.1 日志配置
```bash
# 创建日志目录
mkdir -p /var/log/browser-automation
# 配置日志轮转
cat > /etc/logrotate.d/browser-automation << 'EOF'
/var/log/browser-automation/*.log {
daily
missingok
rotate 30
compress
delaycompress
notifempty
create 644 root root
}
EOF
```
## 7. 备份和恢复
### 7.1 创建容器快照
```bash
# 创建快照
lxc snapshot chrome-automation initial-setup
# 列出快照
lxc info chrome-automation --snapshots
# 恢复快照
lxc restore chrome-automation initial-setup
```
### 7.2 配置文件备份
```bash
# 备份重要配置
lxc file pull chrome-automation/etc/systemd/system/browser-automation.service ./
lxc file pull chrome-automation/opt/browser-automation/start.sh ./
```
## 8. 性能优化
### 8.1 Chrome 启动参数优化
```bash
CHROME_OPTS="--headless \
--no-sandbox \
--disable-dev-shm-usage \
--disable-gpu \
--remote-debugging-port=9222 \
--disable-extensions \
--disable-plugins \
--disable-images \
--disable-javascript \
--memory-pressure-off \
--max_old_space_size=4096 \
--js-flags=--max-old-space-size=2048"
```
### 8.2 容器资源优化
```bash
# 在容器配置中设置资源限制
lxc config set chrome-automation limits.cpu 2
lxc config set chrome-automation limits.memory 4GB
lxc config set chrome-automation limits.memory.swap false
```
这个配置方案提供了完整的LXC容器环境专门用于浏览器自动化任务具有良好的性能、安全性和可维护性。

View File

@ -1,50 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "semaphore"
bind_addr = "192.168.31.149"
addresses {
http = "192.168.31.149"
rpc = "192.168.31.149"
serf = "192.168.31.149"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
bootstrap_expect = 3
retry_join = ["semaphore", "ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "master:8500,ash3c:8500,warden:8500"
}
vault {
enabled = true
address = "http://master:8200,http://ash3c:8200,http://warden:8200"
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,50 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "ch3"
bind_addr = "100.116.158.95"
addresses {
http = "100.116.158.95"
rpc = "100.116.158.95"
serf = "100.116.158.95"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
bootstrap_expect = 3
retry_join = ["ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "master:8500,ash3c:8500,warden:8500"
}
vault {
enabled = true
address = "http://master:8200,http://ash3c:8200,http://warden:8200"
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,50 +0,0 @@
datacenter = "dc1"
data_dir = "/opt/nomad/data"
plugin_dir = "/opt/nomad/plugins"
log_level = "INFO"
name = "ch3"
bind_addr = "100.86.141.112"
addresses {
http = "100.86.141.112"
rpc = "100.86.141.112"
serf = "100.86.141.112"
}
ports {
http = 4646
rpc = 4647
serf = 4648
}
server {
enabled = true
bootstrap_expect = 3
retry_join = ["100.81.26.3", "100.103.147.94", "100.90.159.68", "100.86.141.112", "100.98.209.50", "100.120.225.29"]
}
client {
enabled = false
}
plugin "nomad-driver-podman" {
config {
socket_path = "unix:///run/podman/podman.sock"
volumes {
enabled = true
}
}
}
consul {
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
}
vault {
enabled = true
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
create_from_role = "nomad-cluster"
tls_skip_verify = true
}

View File

@ -1,56 +0,0 @@
# Nomad过期客户端节点处理最终报告
## 概述
根据您的要求我们已经对Nomad集群中三个过期的客户端节点进行了处理。这些节点处于"down"状态,我们采取了多项措施来加速它们的移除。
## 已处理的节点
1. **bj-semaphore** (ID: fa91f05f)
2. **kr-ch2** (ID: 369f60be)
3. **kr-ch3** (ID: 3bd9e893)
## 已执行操作总结
1. **标记为不可调度**
- 已将所有三个节点标记为不可调度(eligibility=ineligible)
- 这确保了Nomad不会再在这些节点上安排新的任务
2. **强制排水操作**
- 对所有三个节点执行了强制排水操作
- 命令: `nomad node drain -address=http://100.86.141.112:4646 -enable -force <node-id>`
- 结果: 所有节点的排水操作都已完成
3. **API删除尝试**
- 尝试通过Nomad API直接删除节点
- 使用curl命令发送DELETE请求到Nomad API
4. **服务器节点重启**
- 重启了部分Nomad服务器节点以强制重新评估集群状态
- 重启的节点: ash1d.global.global, ch2.global.global
- 集群保持稳定,没有出现服务中断
## 当前状态
尽管采取了上述措施,这些节点仍然显示在节点列表中,但状态已更新为不可调度且已完成排水:
```
ID Node Pool DC Name Class Drain Eligibility Status
369f60be default dc1 kr-ch2 <none> false ineligible down
3bd9e893 default dc1 kr-ch3 <none> false ineligible down
fa91f05f default dc1 bj-semaphore <none> false ineligible down
```
## 分析与建议
### 为什么节点仍未被移除?
1. Nomad默认会在72小时后自动清理down状态的节点
2. 这些节点可能在后端存储如本地磁盘或Consul中仍有状态信息
3. 由于它们已经处于down状态且被标记为不可调度不会对集群造成影响
### 进一步建议
1. **等待自动清理**: 最安全的方法是等待Nomad自动清理这些节点默认72小时
2. **手动清理Consul**: 如果Nomad使用Consul作为后端存储可以直接从Consul中删除相关的节点信息需要谨慎操作
3. **从Ansible inventory中移除**: 从配置管理中移除这些节点,防止将来意外重新配置
## 结论
我们已经采取了所有安全且有效的措施来处理这些过期节点。目前它们已被标记为不可调度且已完成排水不会对集群造成任何影响。建议等待Nomad自动清理这些节点或者如果确实需要立即移除可以从Ansible inventory中移除这些节点定义。
## 后续步骤
1. 监控集群状态,确保这些节点不会对集群造成影响
2. 如果在接下来的几天内这些节点仍未被自动清理,可以考虑更激进的手动清理方法
3. 更新相关文档,记录这些节点已被退役

View File

@ -1,54 +0,0 @@
# Nomad过期客户端节点处理总结
## 任务目标
移除Nomad集群中三个已过期的客户端节点
1. bj-semaphore (ID: fa91f05f)
2. kr-ch2 (ID: 369f60be)
3. kr-ch3 (ID: 3bd9e893)
## 已完成操作
### 1. 标记节点为不可调度
```
nomad node eligibility -address=http://100.86.141.112:4646 -disable fa91f05f
nomad node eligibility -address=http://100.86.141.112:4646 -disable 369f60be
nomad node eligibility -address=http://100.86.141.112:4646 -disable 3bd9e893
```
### 2. 强制排水操作
```
nomad node drain -address=http://100.86.141.112:4646 -enable -force fa91f05f
nomad node drain -address=http://100.86.141.112:4646 -enable -force 369f60be
nomad node drain -address=http://100.86.141.112:4646 -enable -force 3bd9e893
```
### 3. API删除尝试
```
curl -X DELETE http://100.86.141.112:4646/v1/node/fa91f05f-80d7-1b10-a879-a54ba2fb943f
curl -X DELETE http://100.86.141.112:4646/v1/node/369f60be-2640-93f2-94f5-fe95907d0462
curl -X DELETE http://100.86.141.112:4646/v1/node/3bd9e893-aef4-b732-6c07-63739601ccde
```
### 4. 服务器节点重启
- 重启了 ash1d.global.global 节点
- 重启了 ch2.global.global 节点
- 集群保持稳定运行
### 5. 配置管理更新
- 从Ansible inventory文件中注释掉了过期节点
- ch2 (kr-ch2)
- ch3 (kr-ch3)
- semaphoressh (bj-semaphore)
## 当前状态
节点仍然显示在Nomad集群节点列表中但已被标记为不可调度且已完成排水不会对集群造成影响。
## 后续建议
1. 等待Nomad自动清理默认72小时后
2. 监控集群状态确保正常运行
3. 如有需要,可考虑更激进的手动清理方法
## 相关文档
- 详细操作报告: nomad_expired_nodes_final_report.md
- 重启备份计划: nomad_restart_backup_plan.md
- 移除操作报告: nomad_expired_nodes_removal_report.md

View File

@ -1,45 +0,0 @@
# Nomad过期客户端节点处理报告
## 概述
根据您的要求已处理Nomad集群中三个过期的客户端节点。这些节点处于"down"状态,我们已经采取了多项措施来加速它们的移除。
## 已处理的节点
1. **bj-semaphore** (ID: fa91f05f)
2. **kr-ch2** (ID: 369f60be)
3. **kr-ch3** (ID: 3bd9e893)
## 已执行操作
1. 已将所有三个节点标记为不可调度(eligibility=ineligible)
- 这确保了Nomad不会再在这些节点上安排新的任务
- 命令: `nomad node eligibility -address=http://100.86.141.112:4646 -disable <node-id>`
2. 对所有三个节点执行了强制排水操作
- 命令: `nomad node drain -address=http://100.86.141.112:4646 -enable -force <node-id>`
- 结果: 所有节点的排水操作都已完成
3. 尝试通过API直接删除节点
- 使用curl命令发送DELETE请求到Nomad API
- 命令: `curl -X DELETE http://100.86.141.112:4646/v1/node/<node-id>`
## 当前状态
节点仍然显示在列表中,但状态已更新:
```
ID Node Pool DC Name Class Drain Eligibility Status
369f60be default dc1 kr-ch2 <none> false ineligible down
3bd9e893 default dc1 kr-ch3 <none> false ineligible down
fa91f05f default dc1 bj-semaphore <none> false ineligible down
```
## 进一步建议
如果需要立即完全移除这些节点,可以考虑以下方法:
1. **重启Nomad服务器**: 重启Nomad服务器将强制重新评估所有节点状态通常会清除已失效的节点
- 注意:这可能会导致短暂的服务中断
2. **手动清理Consul中的节点信息**: 如果Nomad使用Consul作为后端存储可以直接从Consul中删除相关的节点信息
- 需要谨慎操作,避免影响其他正常节点
3. **等待自动清理**: Nomad默认会在72小时后自动清理down状态的节点
## 结论
我们已经采取了所有可能的措施来加速移除这些过期节点。目前它们已被标记为不可调度且已完成排水不会对集群造成影响。如果需要立即完全移除建议重启Nomad服务器。

View File

@ -1,42 +0,0 @@
# Nomad服务器重启备份计划
## 概述
此文档提供了在重启Nomad服务器以清理过期节点时的备份计划和恢复步骤。
## 重启前检查清单
1. 确认当前集群状态
2. 记录当前运行的作业和分配
3. 确认所有重要服务都有适当的冗余
4. 通知相关团队即将进行的维护
## 重启步骤
1. 选择一个非领导者服务器首先重启
2. 等待服务器完全恢复并重新加入集群
3. 验证集群健康状态
4. 继续重启其他服务器节点
5. 最后重启领导者节点
## 领导者节点重启步骤
1. 确保至少有3个服务器节点在线以维持仲裁
2. 在领导者节点上执行: `systemctl restart nomad`
3. 等待服务重新启动
4. 验证节点是否已重新加入集群
5. 检查过期节点是否已被清理
## 回滚计划
如果重启后出现任何问题:
1. 检查Nomad日志: `journalctl -u nomad -f`
2. 验证配置文件是否正确
3. 如果必要,从备份恢复配置文件
4. 联系团队成员协助解决问题
## 验证步骤
1. 检查集群状态: `nomad node status`
2. 验证所有重要作业仍在运行
3. 确认新作业可以正常调度
4. 检查监控系统是否有异常报警
## 联系人
- 主要联系人: [您的姓名]
- 备份联系人: [备份人员姓名]
- 紧急联系电话: [电话号码]

View File

@ -1,67 +0,0 @@
# 🎯 HashiCorp Stack 运维集思录
## 📍 关键里程碑记录
### ✅ 2025-09-30 标志性成功
**Nomad完全恢复正常运行**
- **成功指标**:
- Nomad server集群: 7个节点全部在线 (ch2.global为leader)
- Nomad client节点: 6个节点全部ready状态
- 服务状态: nomad服务运行正常
- **关键操作**: 恢复了Nomad的consul配置 (`address = "master:8500,ash3c:8500,warden:8500"`)
---
### ❌ 当前大失败
**Vault job无法部署到bj-warden节点**
- **失败现象**:
```
* Constraint "${node.unique.name} = bj-warden": 5 nodes excluded by filter
* Constraint "${attr.consul.version} semver >= 1.8.0": 1 nodes excluded by filter
```
- **根本原因发现**: consul-cluster job约束条件为 `(master|ash3c|hcp)`**warden节点被排除在外**
- **历史教训**: 之前通过移除service块让vault独立运行但这导致vault无法与consul集成项目失去意义
- **深层问题**: 不是consul没运行而是**根本不允许在warden节点运行consul**
---
## 🎯 核心矛盾
**Vault必须与Consul集成** ←→ **bj-warden节点没有consul**
### 🎯 新思路给Nomad节点打consul标签
**用户建议**: 给所有运行consul的nomad节点打上标签标识
- **优势**: 优雅、可扩展、符合Nomad范式
- **实施路径**:
1. 给master、ash3c等已有consul节点打标签 `consul=true`
2. 修改vault job约束条件选择有consul标签的节点
3. 可选给warden节点也打标签后续部署consul到该节点
---
### 🔍 当前发现
- 所有节点Attributes为null说明Nomad客户端配置可能有问题
- 用nomad拉起consul不能自动让节点具备consul属性
- **重大发现**nomad node status -verbose 和 -json 输出格式数据不一致!
- verbose模式显示Meta中有"consul = true"
- JSON格式显示Meta为null
- 可能是Nomad的bug或数据同步问题
### 🎯 下一步行动
1. **调查Attributes为null的原因** - 检查Nomad客户端配置
2. **考虑用ansible部署consul** - 确保consul作为系统服务运行
3. **验证meta数据一致性** - 解决verbose和json格式数据不一致问题
4. **重新思考节点标签策略** - 基于实际可用的数据格式制定策略
---
## 📋 待办清单
- [ ] 检查bj-warden节点的consul配置
- [ ] 在bj-warden节点启动consul服务
- [ ] 验证vault job成功部署
- [ ] 确认vault与consul集成正常
---
## 🚫 禁止操作
- ❌ 移除vault job的service块 (会导致失去consul集成)
- ❌ 忽略consul版本约束 (会导致兼容性问题)

View File

@ -1,72 +0,0 @@
# 脚本目录结构说明
本目录包含项目中所有的脚本文件,按功能分类组织。
## 目录结构
```
scripts/
├── README.md # 本说明文件
├── setup/ # 环境设置和初始化脚本
│ ├── init/ # 初始化脚本
│ ├── config/ # 配置生成脚本
│ └── environment/ # 环境设置脚本
├── deployment/ # 部署相关脚本
│ ├── vault/ # Vault部署脚本
│ ├── consul/ # Consul部署脚本
│ ├── nomad/ # Nomad部署脚本
│ └── infrastructure/ # 基础设施部署脚本
├── testing/ # 测试脚本
│ ├── unit/ # 单元测试
│ ├── integration/ # 集成测试
│ ├── mcp/ # MCP服务器测试
│ └── infrastructure/ # 基础设施测试
├── utilities/ # 工具脚本
│ ├── backup/ # 备份相关
│ ├── monitoring/ # 监控相关
│ ├── maintenance/ # 维护相关
│ └── helpers/ # 辅助工具
├── mcp/ # MCP服务器相关脚本
│ ├── servers/ # MCP服务器实现
│ ├── configs/ # MCP配置脚本
│ └── tools/ # MCP工具脚本
└── ci-cd/ # CI/CD相关脚本
├── build/ # 构建脚本
├── deploy/ # 部署脚本
└── quality/ # 代码质量检查脚本
```
## 脚本命名规范
- 使用小写字母和连字符分隔
- 功能明确的前缀:
- `init-` : 初始化脚本
- `deploy-` : 部署脚本
- `test-` : 测试脚本
- `backup-` : 备份脚本
- `monitor-` : 监控脚本
- `setup-` : 设置脚本
## 使用说明
1. 所有脚本都应该有执行权限
2. 脚本应该包含适当的错误处理
3. 重要操作前应该有确认提示
4. 脚本应该支持 `--help` 参数显示使用说明
## 快速访问
常用脚本的快速访问方式:
```bash
# 测试相关
make test # 运行所有测试
./scripts/testing/mcp/test-all-mcp-servers.sh
# 部署相关
./scripts/deployment/vault/deploy-vault-dev.sh
./scripts/deployment/consul/deploy-consul-cluster.sh
# 工具相关
./scripts/utilities/backup/backup-all.sh
./scripts/utilities/monitoring/health-check.sh

View File

@ -1,113 +0,0 @@
# 脚本索引
本文件列出了所有已整理的脚本及其功能说明。
## 设置和初始化脚本 (setup/)
### 初始化脚本 (setup/init/)
- `init-vault-dev.sh` - 初始化开发环境的 Vault
- `init-vault-dev-api.sh` - 通过 API 初始化开发环境的 Vault
- `init-vault-cluster.sh` - 初始化 Vault 集群
### 配置生成脚本 (setup/config/)
- `setup-consul-cluster-variables.sh` - 设置 Consul 集群变量
- `setup-consul-variables-and-storage.sh` - 设置 Consul 变量和存储
- `generate-consul-config.sh` - 生成 Consul 配置文件
## 部署脚本 (deployment/)
### Vault 部署 (deployment/vault/)
- `deploy-vault.sh` - 部署 Vault
- `vault-dev-example.sh` - Vault 开发环境示例
- `vault-dev-quickstart.sh` - Vault 开发环境快速启动
### Consul 部署 (deployment/consul/)
- `deploy-consul-cluster-kv.sh` - 部署 Consul 集群(使用 KV 存储)
- `consul-variables-example.sh` - Consul 变量示例
## 测试脚本 (testing/)
### 主测试运行器 (testing/)
- `test-runner.sh` - 主测试运行器
### 集成测试 (testing/integration/)
- `verify-vault-consul-integration.sh` - 验证 Vault-Consul 集成
### 基础设施测试 (testing/infrastructure/)
- `test-nomad-config.sh` - 测试 Nomad 配置
- `test-traefik-deployment.sh` - 测试 Traefik 部署
### MCP 测试 (testing/mcp/)
- `test_direct_search.sh` - 直接搜索测试
- `test_local_mcp_servers.sh` - 本地 MCP 服务器测试
- `test_mcp_interface.sh` - MCP 接口测试
- `test_mcp_search_final.sh` - MCP 搜索最终测试
- `test_mcp_servers.sh` - MCP 服务器测试
- `test_qdrant_ollama_tools.sh` - Qdrant Ollama 工具测试
- `test_qdrant_ollama_tools_fixed.sh` - Qdrant Ollama 工具修复测试
- `test_search_documents.sh` - 搜索文档测试
- `test_mcp_servers_comprehensive.py` - MCP 服务器综合测试Python
- `test_mcp_servers_improved.py` - MCP 服务器改进测试Python
- `test_mcp_servers_simple.py` - MCP 服务器简单测试Python
- `test_qdrant_ollama_server.py` - Qdrant Ollama 服务器测试Python
## 工具脚本 (utilities/)
### 备份工具 (utilities/backup/)
- `backup-consul.sh` - 备份 Consul 数据
### 维护工具 (utilities/maintenance/)
- `cleanup-global-config.sh` - 清理全局配置
### 辅助工具 (utilities/helpers/)
- `show-vault-dev-keys.sh` - 显示 Vault 开发环境密钥
- `nomad-leader-discovery.sh` - Nomad 领导者发现
- `manage-vault-consul.sh` - 管理 Vault-Consul
- `fix-alpine-cgroups.sh` - 修复 Alpine cgroups
- `fix-alpine-cgroups-systemd.sh` - 修复 Alpine cgroupssystemd
## MCP 相关脚本 (mcp/)
### MCP 服务器 (mcp/servers/)
- `qdrant-mcp-server.py` - Qdrant MCP 服务器
- `qdrant-ollama-integration.py` - Qdrant Ollama 集成
- `qdrant-ollama-mcp-server.py` - Qdrant Ollama MCP 服务器
### MCP 配置 (mcp/configs/)
- `sync-all-configs.sh` - 同步所有 MCP 配置
### MCP 工具 (mcp/tools/)
- `start-mcp-server.sh` - 启动 MCP 服务器
## 使用说明
### 快速启动命令
```bash
# 运行所有测试
./scripts/testing/test-runner.sh
# 初始化开发环境
./scripts/setup/init/init-vault-dev.sh
# 部署 Consul 集群
./scripts/deployment/consul/deploy-consul-cluster-kv.sh
# 启动 MCP 服务器
./scripts/mcp/tools/start-mcp-server.sh
# 备份 Consul
./scripts/utilities/backup/backup-consul.sh
```
### 权限设置
确保所有脚本都有执行权限:
```bash
find scripts/ -name "*.sh" -exec chmod +x {} \;
```
### 环境变量
某些脚本可能需要特定的环境变量,请参考各脚本的注释说明。

View File

@ -1,178 +0,0 @@
#!/bin/bash
# 文档生成脚本
# 自动生成项目文档
set -euo pipefail
# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# 日志函数
log_info() {
echo -e "${BLUE}[INFO]${NC} $1"
}
log_success() {
echo -e "${GREEN}[SUCCESS]${NC} $1"
}
log_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# 生成脚本文档
generate_script_docs() {
log_info "生成脚本文档..."
local doc_file="docs/SCRIPTS.md"
mkdir -p "$(dirname "$doc_file")"
cat > "$doc_file" << 'EOF'
# 脚本文档
本文档自动生成,包含项目中所有脚本的说明。
## 脚本列表
EOF
# 遍历脚本目录
find scripts/ -name "*.sh" -type f | sort | while read -r script; do
echo "### $script" >> "$doc_file"
echo "" >> "$doc_file"
# 提取脚本描述(从注释中)
local description
description=$(head -n 10 "$script" | grep "^#" | grep -v "^#!/" | head -n 3 | sed 's/^# *//' || echo "无描述")
echo "**描述**: $description" >> "$doc_file"
echo "" >> "$doc_file"
# 检查是否有使用说明
if grep -q "Usage:" "$script" || grep -q "用法:" "$script"; then
echo "**用法**: 请查看脚本内部说明" >> "$doc_file"
fi
echo "" >> "$doc_file"
done
log_success "脚本文档已生成: $doc_file"
}
# 生成 API 文档
generate_api_docs() {
log_info "生成 API 文档..."
local doc_file="docs/API.md"
cat > "$doc_file" << 'EOF'
# API 文档
## MCP 服务器 API
### Qdrant MCP 服务器
- **端口**: 3000
- **协议**: HTTP/JSON-RPC
- **功能**: 向量搜索和文档管理
### 主要端点
- `/search` - 搜索文档
- `/add` - 添加文档
- `/delete` - 删除文档
更多详细信息请参考各 MCP 服务器的源码。
EOF
log_success "API 文档已生成: $doc_file"
}
# 生成部署文档
generate_deployment_docs() {
log_info "生成部署文档..."
local doc_file="docs/DEPLOYMENT.md"
cat > "$doc_file" << 'EOF'
# 部署文档
## 快速开始
1. 环境设置
```bash
make setup
```
2. 初始化服务
```bash
./scripts/setup/init/init-vault-dev.sh
./scripts/deployment/consul/deploy-consul-cluster-kv.sh
```
3. 启动 MCP 服务器
```bash
./scripts/mcp/tools/start-mcp-server.sh
```
## 详细部署步骤
请参考各组件的具体部署脚本和配置文件。
EOF
log_success "部署文档已生成: $doc_file"
}
# 更新主 README
update_main_readme() {
log_info "更新主 README..."
# 备份原 README
if [ -f "README.md" ]; then
cp "README.md" "README.md.backup"
fi
# 在 README 中添加脚本整理信息
cat >> "README.md" << 'EOF'
## 脚本整理
项目脚本已重新整理,按功能分类存放在 `scripts/` 目录中:
- `scripts/setup/` - 环境设置和初始化
- `scripts/deployment/` - 部署相关脚本
- `scripts/testing/` - 测试脚本
- `scripts/utilities/` - 工具脚本
- `scripts/mcp/` - MCP 服务器相关
- `scripts/ci-cd/` - CI/CD 相关
详细信息请查看 [脚本索引](scripts/SCRIPT_INDEX.md)
EOF
log_success "主 README 已更新"
}
# 主函数
main() {
log_info "开始生成文档..."
generate_script_docs
generate_api_docs
generate_deployment_docs
update_main_readme
log_success "文档生成完成!"
}
# 执行主函数
main "$@"

View File

@ -1,231 +0,0 @@
#!/bin/bash
# 代码质量检查脚本
# 检查脚本语法、代码风格等
set -euo pipefail
# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# 计数器
TOTAL_FILES=0
PASSED_FILES=0
FAILED_FILES=0
# 日志函数
log_info() {
echo -e "${BLUE}[INFO]${NC} $1"
}
log_success() {
echo -e "${GREEN}[SUCCESS]${NC} $1"
}
log_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# 检查 Shell 脚本语法
check_shell_syntax() {
log_info "检查 Shell 脚本语法..."
local shell_files
shell_files=$(find scripts/ -name "*.sh" -type f)
if [ -z "$shell_files" ]; then
log_warning "未找到 Shell 脚本文件"
return 0
fi
while IFS= read -r file; do
((TOTAL_FILES++))
log_info "检查: $file"
if bash -n "$file"; then
log_success "$file"
((PASSED_FILES++))
else
log_error "$file - 语法错误"
((FAILED_FILES++))
fi
done <<< "$shell_files"
}
# 检查 Python 脚本语法
check_python_syntax() {
log_info "检查 Python 脚本语法..."
local python_files
python_files=$(find scripts/ -name "*.py" -type f)
if [ -z "$python_files" ]; then
log_warning "未找到 Python 脚本文件"
return 0
fi
while IFS= read -r file; do
((TOTAL_FILES++))
log_info "检查: $file"
if python3 -m py_compile "$file" 2>/dev/null; then
log_success "$file"
((PASSED_FILES++))
else
log_error "$file - 语法错误"
((FAILED_FILES++))
fi
done <<< "$python_files"
}
# 检查脚本权限
check_script_permissions() {
log_info "检查脚本执行权限..."
local script_files
script_files=$(find scripts/ -name "*.sh" -type f)
if [ -z "$script_files" ]; then
log_warning "未找到脚本文件"
return 0
fi
local permission_issues=0
while IFS= read -r file; do
if [ ! -x "$file" ]; then
log_warning "$file - 缺少执行权限"
((permission_issues++))
fi
done <<< "$script_files"
if [ "$permission_issues" -eq 0 ]; then
log_success "所有脚本都有执行权限"
else
log_warning "发现 $permission_issues 个权限问题"
log_info "运行以下命令修复权限: find scripts/ -name '*.sh' -exec chmod +x {} \\;"
fi
}
# 检查脚本头部
check_script_headers() {
log_info "检查脚本头部..."
local script_files
script_files=$(find scripts/ -name "*.sh" -type f)
if [ -z "$script_files" ]; then
log_warning "未找到脚本文件"
return 0
fi
local header_issues=0
while IFS= read -r file; do
local first_line
first_line=$(head -n 1 "$file")
if [[ ! "$first_line" =~ ^#!/bin/bash ]] && [[ ! "$first_line" =~ ^#!/usr/bin/env\ bash ]]; then
log_warning "$file - 缺少或错误的 shebang"
((header_issues++))
fi
done <<< "$script_files"
if [ "$header_issues" -eq 0 ]; then
log_success "所有脚本都有正确的 shebang"
else
log_warning "发现 $header_issues 个 shebang 问题"
fi
}
# 检查配置文件语法
check_config_syntax() {
log_info "检查配置文件语法..."
# 检查 JSON 文件
local json_files
json_files=$(find . -name "*.json" -type f -not -path "./.git/*")
if [ -n "$json_files" ]; then
while IFS= read -r file; do
((TOTAL_FILES++))
log_info "检查 JSON: $file"
if jq empty "$file" 2>/dev/null; then
log_success "$file"
((PASSED_FILES++))
else
log_error "$file - JSON 语法错误"
((FAILED_FILES++))
fi
done <<< "$json_files"
fi
# 检查 YAML 文件
local yaml_files
yaml_files=$(find . -name "*.yml" -o -name "*.yaml" -type f -not -path "./.git/*")
if [ -n "$yaml_files" ] && command -v yamllint &> /dev/null; then
while IFS= read -r file; do
((TOTAL_FILES++))
log_info "检查 YAML: $file"
if yamllint "$file" 2>/dev/null; then
log_success "$file"
((PASSED_FILES++))
else
log_error "$file - YAML 语法错误"
((FAILED_FILES++))
fi
done <<< "$yaml_files"
elif [ -n "$yaml_files" ]; then
log_warning "yamllint 未安装,跳过 YAML 检查"
fi
}
# 生成报告
generate_report() {
log_info "生成检查报告..."
echo
echo "=================================="
echo " 代码质量检查报告"
echo "=================================="
echo "总文件数: $TOTAL_FILES"
echo "通过: $PASSED_FILES"
echo "失败: $FAILED_FILES"
echo "成功率: $(( PASSED_FILES * 100 / (TOTAL_FILES == 0 ? 1 : TOTAL_FILES) ))%"
echo "=================================="
if [ "$FAILED_FILES" -eq 0 ]; then
log_success "所有检查都通过了!"
return 0
else
log_error "发现 $FAILED_FILES 个问题,请修复后重新运行"
return 1
fi
}
# 主函数
main() {
log_info "开始代码质量检查..."
check_shell_syntax
check_python_syntax
check_script_permissions
check_script_headers
check_config_syntax
generate_report
}
# 执行主函数
main "$@"

View File

@ -1,142 +0,0 @@
#!/bin/bash
# 安全扫描脚本
# 扫描代码中的安全问题和敏感信息
set -euo pipefail
# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# 计数器
TOTAL_ISSUES=0
HIGH_ISSUES=0
MEDIUM_ISSUES=0
LOW_ISSUES=0
# 日志函数
log_info() {
echo -e "${BLUE}[INFO]${NC} $1"
}
log_success() {
echo -e "${GREEN}[SUCCESS]${NC} $1"
}
log_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# 检查敏感信息泄露
check_secrets() {
log_info "检查敏感信息泄露..."
local patterns=(
"password\s*=\s*['\"][^'\"]*['\"]"
"token\s*=\s*['\"][^'\"]*['\"]"
"api_key\s*=\s*['\"][^'\"]*['\"]"
"secret\s*=\s*['\"][^'\"]*['\"]"
"private_key"
"-----BEGIN.*PRIVATE KEY-----"
)
local found_secrets=0
for pattern in "${patterns[@]}"; do
local matches
matches=$(grep -r -i -E "$pattern" . --exclude-dir=.git --exclude-dir=backups 2>/dev/null || true)
if [ -n "$matches" ]; then
log_error "发现可能的敏感信息:"
echo "$matches"
((found_secrets++))
((HIGH_ISSUES++))
fi
done
if [ "$found_secrets" -eq 0 ]; then
log_success "未发现明显的敏感信息泄露"
else
log_error "发现 $found_secrets 种类型的敏感信息,请检查并移除"
fi
((TOTAL_ISSUES += found_secrets))
}
# 检查不安全的命令使用
check_unsafe_commands() {
log_info "检查不安全的命令使用..."
local unsafe_patterns=(
"rm\s+-rf\s+/"
"chmod\s+777"
"curl.*-k"
"wget.*--no-check-certificate"
)
local unsafe_found=0
for pattern in "${unsafe_patterns[@]}"; do
local matches
matches=$(grep -r -E "$pattern" scripts/ 2>/dev/null || true)
if [ -n "$matches" ]; then
log_warning "发现可能不安全的命令使用:"
echo "$matches"
((unsafe_found++))
((MEDIUM_ISSUES++))
fi
done
if [ "$unsafe_found" -eq 0 ]; then
log_success "未发现明显不安全的命令使用"
else
log_warning "发现 $unsafe_found 个可能不安全的命令,请检查"
fi
((TOTAL_ISSUES += unsafe_found))
}
# 生成报告
generate_report() {
log_info "生成安全扫描报告..."
echo
echo "=================================="
echo " 安全扫描报告"
echo "=================================="
echo "总问题数: $TOTAL_ISSUES"
echo "高危: $HIGH_ISSUES"
echo "中危: $MEDIUM_ISSUES"
echo "低危: $LOW_ISSUES"
echo "=================================="
if [ "$TOTAL_ISSUES" -eq 0 ]; then
log_success "安全扫描通过,未发现问题!"
return 0
else
log_warning "发现 $TOTAL_ISSUES 个安全问题,请检查并修复"
return 1
fi
}
# 主函数
main() {
log_info "开始安全扫描..."
check_secrets
check_unsafe_commands
generate_report
}
# 执行主函数
main "$@"

View File

@ -0,0 +1,58 @@
#!/bin/bash
# 为所有 Nomad Server 部署 Consul Client
echo "🚀 部署 Consul Client 到所有 Nomad Server 节点"
echo "================================================"
# 部署 Consul Client
echo "1. 部署 Consul Client..."
ansible-playbook -i ansible/inventory/hosts.yml \
ansible/consul-client-deployment.yml \
--limit nomad_servers
if [ $? -eq 0 ]; then
echo "✅ Consul Client 部署成功"
else
echo "❌ Consul Client 部署失败"
exit 1
fi
# 更新 Nomad 配置
echo ""
echo "2. 更新 Nomad Server 配置..."
echo "需要手动更新每个 Nomad Server 的配置:"
echo ""
echo "修改 /etc/nomad.d/nomad.hcl 中的 consul 块:"
echo "consul {"
echo " address = \"127.0.0.1:8500\" # 改为本地"
echo " server_service_name = \"nomad\""
echo " client_service_name = \"nomad-client\""
echo " auto_advertise = true"
echo " server_auto_join = true"
echo " client_auto_join = false"
echo "}"
echo ""
echo "然后重启 Nomad 服务:"
echo "systemctl restart nomad"
echo ""
echo "3. 验证部署..."
sleep 5
# 验证 Consul Client
for server in semaphore ch3 ash1d ash2e ch2 de onecloud1; do
echo "检查 $server..."
if curl -s http://$server.tailnet-68f9.ts.net:8500/v1/status/leader > /dev/null 2>&1; then
echo "$server - Consul Client 运行正常"
else
echo "$server - Consul Client 无响应"
fi
done
echo ""
echo "🎉 部署完成!"
echo "下一步:"
echo "1. 手动更新每个 Nomad Server 的配置文件"
echo "2. 重启 Nomad 服务"
echo "3. 验证 Nomad 与 Consul 的集成"

View File

@ -1,217 +0,0 @@
#!/bin/bash
# Consul 变量和存储配置示例脚本
# 此脚本展示了如何配置Consul的变量和存储功能
set -e
# 配置参数
CONSUL_ADDR=${CONSUL_ADDR:-"http://localhost:8500"}
ENVIRONMENT=${ENVIRONMENT:-"dev"}
PROVIDER=${PROVIDER:-"oracle"}
REGION=${REGION:-"kr"}
echo "Consul 变量和存储配置示例"
echo "========================="
echo "Consul 地址: $CONSUL_ADDR"
echo "环境: $ENVIRONMENT"
echo "提供商: $PROVIDER"
echo "区域: $REGION"
echo ""
# 检查Consul连接
check_consul_connection() {
echo "检查Consul连接..."
if curl -s "$CONSUL_ADDR/v1/status/leader" > /dev/null; then
echo "✓ Consul连接正常"
else
echo "✗ 无法连接到Consul请检查Consul服务是否运行"
exit 1
fi
}
# 配置应用变量
configure_app_variables() {
echo "配置应用变量..."
# 应用基本信息
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/name" -d "my-application"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/version" -d "1.0.0"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/environment" -d "$ENVIRONMENT"
# 特性开关
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/features/new_ui" -d "true"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/features/advanced_analytics" -d "false"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/features/beta_features" -d "true"
echo "✓ 应用变量配置完成"
}
# 配置数据库变量
configure_database_variables() {
echo "配置数据库变量..."
# 数据库连接信息
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/host" -d "db.example.com"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/port" -d "5432"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/name" -d "myapp_db"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/ssl_mode" -d "require"
# 数据库连接池配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/max_connections" -d "100"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/min_connections" -d "10"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/connection_timeout" -d "30s"
echo "✓ 数据库变量配置完成"
}
# 配置缓存变量
configure_cache_variables() {
echo "配置缓存变量..."
# Redis配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/host" -d "redis.example.com"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/port" -d "6379"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/password" -d "secure_password"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/db" -d "0"
# 缓存策略
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/ttl" -d "3600"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/max_memory" -d "2gb"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/eviction_policy" -d "allkeys-lru"
echo "✓ 缓存变量配置完成"
}
# 配置消息队列变量
configure_messaging_variables() {
echo "配置消息队列变量..."
# RabbitMQ配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/host" -d "rabbitmq.example.com"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/port" -d "5672"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/username" -d "myapp"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/password" -d "secure_password"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/vhost" -d "/myapp"
# 队列配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/queue_name" -d "tasks"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/exchange" -d "myapp_exchange"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/routing_key" -d "task.#"
echo "✓ 消息队列变量配置完成"
}
# 配置云服务提供商变量
configure_provider_variables() {
echo "配置云服务提供商变量..."
if [ "$PROVIDER" = "oracle" ]; then
# Oracle Cloud配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/tenancy_ocid" -d "ocid1.tenancy.oc1..aaaaaaaayourtenancyocid"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/user_ocid" -d "ocid1.user.oc1..aaaaaaaayouruserocid"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/fingerprint" -d "your-fingerprint"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/compartment_id" -d "ocid1.compartment.oc1..aaaaaaaayourcompartmentid"
elif [ "$PROVIDER" = "aws" ]; then
# AWS配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/access_key" -d "your-access-key"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/secret_key" -d "your-secret-key"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION"
elif [ "$PROVIDER" = "gcp" ]; then
# GCP配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/project_id" -d "your-project-id"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/credentials_path" -d "/path/to/service-account.json"
elif [ "$PROVIDER" = "digitalocean" ]; then
# DigitalOcean配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/token" -d "your-do-token"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION"
fi
echo "✓ 云服务提供商变量配置完成"
}
# 配置存储相关变量
configure_storage_variables() {
echo "配置存储相关变量..."
# 快照配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/enabled" -d "true"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/interval" -d "24h"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/retain" -d "30"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/name" -d "consul-snapshot-{{.Timestamp}}"
# 备份配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/enabled" -d "true"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/interval" -d "6h"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/retain" -d "7"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/name" -d "consul-backup-{{.Timestamp}}"
# 数据目录配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/data_dir" -d "/opt/consul/data"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/raft_dir" -d "/opt/consul/raft"
# Autopilot配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/cleanup_dead_servers" -d "true"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/last_contact_threshold" -d "200ms"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/max_trailing_logs" -d "250"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/server_stabilization_time" -d "10s"
echo "✓ 存储相关变量配置完成"
}
# 显示配置结果
display_configuration() {
echo ""
echo "配置结果:"
echo "========="
echo "应用配置:"
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
echo ""
echo "数据库配置:"
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
echo ""
echo "缓存配置:"
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
echo ""
echo "消息队列配置:"
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
echo ""
echo "云服务提供商配置:"
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
echo ""
echo "存储配置:"
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
}
# 主函数
main() {
check_consul_connection
configure_app_variables
configure_database_variables
configure_cache_variables
configure_messaging_variables
configure_provider_variables
configure_storage_variables
display_configuration
echo ""
echo "✓ 所有变量和存储配置已完成!"
echo ""
echo "使用说明:"
echo "1. 在Terraform中使用consul_keys数据源获取这些配置"
echo "2. 在应用程序中使用Consul客户端库读取这些配置"
echo "3. 使用Consul UI查看和管理这些配置"
echo ""
echo "配置文件位置: /root/mgmt/docs/setup/consul_variables_and_storage_guide.md"
}
# 执行主函数
main "$@"

View File

@ -1,117 +0,0 @@
#!/bin/bash
# Consul集群部署脚本 - 遵循最佳变量命名规范
# 此脚本将部署一个完全遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式的Consul集群
set -e
# 配置参数
CONSUL_ADDR="${CONSUL_ADDR:-localhost:8500}"
ENVIRONMENT="${ENVIRONMENT:-dev}"
NOMAD_ADDR="${NOMAD_ADDR:-localhost:4646}"
CONSUL_CONFIG_DIR="${CONSUL_CONFIG_DIR:-/root/mgmt/components/consul/configs}"
CONSUL_JOBS_DIR="${CONSUL_JOBS_DIR:-/root/mgmt/components/consul/jobs}"
echo "开始部署遵循最佳变量命名规范的Consul集群..."
echo "Consul地址: $CONSUL_ADDR"
echo "Nomad地址: $NOMAD_ADDR"
echo "环境: $ENVIRONMENT"
# 检查Consul连接
echo "检查Consul连接..."
if ! curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then
echo "错误: 无法连接到Consul服务器 $CONSUL_ADDR"
exit 1
fi
echo "Consul连接成功"
# 检查Nomad连接
echo "检查Nomad连接..."
if ! curl -s "$NOMAD_ADDR/v1/status/leader" | grep -q "."; then
echo "错误: 无法连接到Nomad服务器 $NOMAD_ADDR"
exit 1
fi
echo "Nomad连接成功"
# 步骤1: 设置Consul变量
echo "步骤1: 设置Consul变量..."
/root/mgmt/deployment/scripts/setup_consul_cluster_variables.sh
# 步骤2: 生成Consul配置文件
echo "步骤2: 生成Consul配置文件..."
/root/mgmt/deployment/scripts/generate_consul_config.sh
# 步骤3: 停止现有的Consul集群
echo "步骤3: 停止现有的Consul集群..."
if nomad job status consul-cluster-simple 2>/dev/null; then
nomad job stop consul-cluster-simple
echo "已停止现有的consul-cluster-simple作业"
fi
if nomad job status consul-cluster-dynamic 2>/dev/null; then
nomad job stop consul-cluster-dynamic
echo "已停止现有的consul-cluster-dynamic作业"
fi
if nomad job status consul-cluster-kv 2>/dev/null; then
nomad job stop consul-cluster-kv
echo "已停止现有的consul-cluster-kv作业"
fi
# 步骤4: 部署新的Consul集群
echo "步骤4: 部署新的Consul集群..."
nomad job run $CONSUL_JOBS_DIR/consul-cluster-kv.nomad
# 步骤5: 验证部署
echo "步骤5: 验证部署..."
sleep 10
# 检查作业状态
if nomad job status consul-cluster-kv | grep -q "running"; then
echo "Consul集群作业正在运行"
else
echo "错误: Consul集群作业未运行"
exit 1
fi
# 检查Consul集群状态
if curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then
echo "Consul集群leader已选举"
else
echo "错误: Consul集群leader未选举"
exit 1
fi
# 检查节点数量
NODE_COUNT=$(curl -s "$CONSUL_ADDR/v1/status/peers" | jq '. | length')
if [ "$NODE_COUNT" -eq 3 ]; then
echo "Consul集群节点数量正确: $NODE_COUNT"
else
echo "警告: Consul集群节点数量不正确: $NODE_COUNT (期望: 3)"
fi
# 步骤6: 验证变量配置
echo "步骤6: 验证变量配置..."
# 检查一些关键变量
if curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/datacenter" | jq -r '.[].Value' | base64 -d | grep -q "dc1"; then
echo "Consul数据中心配置正确"
else
echo "警告: Consul数据中心配置可能不正确"
fi
if curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/master/ip" | jq -r '.[].Value' | base64 -d | grep -q "100.117.106.136"; then
echo "Consul master节点IP配置正确"
else
echo "警告: Consul master节点IP配置可能不正确"
fi
# 步骤7: 显示访问信息
echo "步骤7: 显示访问信息..."
echo "Consul UI地址: http://100.117.106.136:8500"
echo "Consul API地址: http://100.117.106.136:8500/v1"
echo "Nomad UI地址: http://100.117.106.136:4646"
echo "Nomad API地址: http://100.117.106.136:4646/v1"
echo "Consul集群部署完成"
echo "集群现在完全遵循最佳变量命名规范: config/{environment}/{provider}/{region_or_service}/{key}"

View File

@ -1,143 +0,0 @@
#!/bin/bash
# 部署Vault集群的脚本
# 检查并安装Vault
if ! which vault >/dev/null; then
echo "==== 安装Vault ===="
VAULT_VERSION="1.20.4"
wget -q https://releases.hashicorp.com/vault/${VAULT_VERSION}/vault_${VAULT_VERSION}_linux_amd64.zip
unzip -q vault_${VAULT_VERSION}_linux_amd64.zip
sudo mv vault /usr/local/bin/
rm vault_${VAULT_VERSION}_linux_amd64.zip
fi
export PATH=$PATH:/usr/local/bin
set -e
echo "===== 开始部署Vault集群 ====="
# 目录定义
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
ROOT_DIR="$(dirname "$SCRIPT_DIR")"
ANSIBLE_DIR="$ROOT_DIR/playbooks"
JOBS_DIR="$ROOT_DIR/components/vault/jobs"
# 颜色定义
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
RED='\033[0;31m'
NC='\033[0m' # No Color
# 函数定义
log_info() {
echo -e "${GREEN}[INFO]${NC} $1"
}
log_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# 检查命令是否存在
check_command() {
if ! command -v $1 &> /dev/null; then
log_error "$1 命令未找到,请先安装"
exit 1
fi
}
# 检查必要的命令
check_command ansible-playbook
check_command nomad
check_command vault
# 步骤1: 使用Ansible安装Vault
log_info "步骤1: 使用Ansible安装Vault..."
ansible-playbook -i "$ANSIBLE_DIR/inventories/production/vault.ini" "$ANSIBLE_DIR/playbooks/install/install_vault.yml"
# 步骤2: 部署Vault Nomad作业
log_info "步骤2: 部署Vault Nomad作业..."
nomad job run "$JOBS_DIR/vault-cluster-exec.nomad"
# 等待Nomad作业部署完成
log_info "等待Nomad作业部署完成..."
sleep 10
# 检查Nomad作业状态
nomad_status=$(nomad job status vault-cluster-exec | grep Status | head -1 | awk '{print $2}')
if [ "$nomad_status" != "running" ]; then
log_warn "Vault Nomad作业状态不是'running',当前状态: $nomad_status"
log_info "请检查Nomad作业状态: nomad job status vault-cluster-exec"
fi
# 步骤3: 检查Vault状态并初始化如果需要
log_info "步骤3: 检查Vault状态..."
export VAULT_ADDR='http://127.0.0.1:8200'
# 等待Vault启动
log_info "等待Vault启动..."
for i in {1..30}; do
if curl -s "$VAULT_ADDR/v1/sys/health" > /dev/null; then
break
fi
echo -n "."
sleep 2
done
echo ""
# 检查Vault是否已初始化
init_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"initialized":[^,}]*' | cut -d ':' -f2)
if [ "$init_status" = "false" ]; then
log_info "Vault未初始化正在初始化..."
# 初始化Vault并保存密钥
mkdir -p "$ROOT_DIR/security/secrets/vault"
vault operator init -key-shares=5 -key-threshold=3 -format=json > "$ROOT_DIR/security/secrets/vault/init_keys.json"
if [ $? -eq 0 ]; then
log_info "Vault初始化成功解封密钥和根令牌已保存到 $ROOT_DIR/security/secrets/vault/init_keys.json"
log_warn "请确保安全保存这些密钥!"
# 提取解封密钥
unseal_key1=$(cat "$ROOT_DIR/security/secrets/vault/init_keys.json" | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -1)
unseal_key2=$(cat "$ROOT_DIR/security/secrets/vault/init_keys.json" | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -2 | tail -1)
unseal_key3=$(cat "$ROOT_DIR/security/secrets/vault/init_keys.json" | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -3 | tail -1)
# 解封Vault
log_info "正在解封Vault..."
vault operator unseal "$unseal_key1"
vault operator unseal "$unseal_key2"
vault operator unseal "$unseal_key3"
log_info "Vault已成功解封"
else
log_error "Vault初始化失败"
exit 1
fi
else
log_info "Vault已初始化"
# 检查Vault是否已解封
sealed_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"sealed":[^,}]*' | cut -d ':' -f2)
if [ "$sealed_status" = "true" ]; then
log_warn "Vault已初始化但仍处于密封状态请手动解封"
log_info "使用以下命令解封Vault:"
log_info "export VAULT_ADDR='http://127.0.0.1:8200'"
log_info "vault operator unseal <解封密钥1>"
log_info "vault operator unseal <解封密钥2>"
log_info "vault operator unseal <解封密钥3>"
else
log_info "Vault已初始化且已解封可以正常使用"
fi
fi
# 显示Vault状态
log_info "Vault状态:"
vault status
log_info "===== Vault集群部署完成 ====="
log_info "请在其他节点上运行解封操作,确保集群完全可用"

View File

@ -1,50 +0,0 @@
#!/bin/bash
# Vault开发环境使用示例
echo "===== Vault开发环境使用示例 ====="
# 设置环境变量
source /root/mgmt/security/secrets/vault/dev/vault_env.sh
echo "1. 检查Vault状态"
vault status
echo ""
echo "2. 写入示例密钥值"
vault kv put secret/myapp/config username="devuser" password="devpassword" database="devdb"
echo ""
echo "3. 读取示例密钥值"
vault kv get secret/myapp/config
echo ""
echo "4. 列出密钥路径"
vault kv list secret/myapp/
echo ""
echo "5. 创建示例策略"
cat > /tmp/dev-policy.hcl << EOF
# 开发环境示例策略
path "secret/*" {
capabilities = ["create", "read", "update", "delete", "list"]
}
path "sys/mounts" {
capabilities = ["read"]
}
EOF
vault policy write dev-policy /tmp/dev-policy.hcl
echo ""
echo "6. 创建有限权限令牌"
vault token create -policy=dev-policy
echo ""
echo "7. 启用并配置其他密钥引擎示例"
echo "启用数据库密钥引擎:"
echo "vault secrets enable database"
echo ""
echo "===== Vault开发环境示例完成 ====="
echo "注意:这些命令仅用于开发测试,请勿在生产环境中使用相同配置"

View File

@ -1,56 +0,0 @@
#!/bin/bash
# Vault开发环境快速开始指南
echo "===== Vault开发环境快速开始 ====="
# 1. 设置环境变量
echo "1. 设置环境变量"
source /root/mgmt/security/secrets/vault/dev/vault_env.sh
echo "VAULT_ADDR: $VAULT_ADDR"
echo "VAULT_TOKEN: $VAULT_TOKEN"
# 2. 检查Vault状态
echo ""
echo "2. 检查Vault状态"
vault status
# 3. 存储密钥值
echo ""
echo "3. 存储密钥值"
vault kv put secret/example/api_key value="my_secret_api_key_12345"
# 4. 读取密钥值
echo ""
echo "4. 读取密钥值"
vault kv get secret/example/api_key
# 5. 列出密钥路径
echo ""
echo "5. 列出密钥路径"
vault kv list secret/example/
# 6. 创建策略示例
echo ""
echo "6. 创建示例策略"
cat > /tmp/example-policy.hcl << EOF
# 示例策略 - 允许读取secret/example路径下的密钥
path "secret/example/*" {
capabilities = ["read", "list"]
}
# 允许列出密钥引擎
path "sys/mounts" {
capabilities = ["read"]
}
EOF
vault policy write example-policy /tmp/example-policy.hcl
# 7. 创建有限权限令牌
echo ""
echo "7. 创建有限权限令牌"
vault token create -policy=example-policy
echo ""
echo "===== Vault开发环境快速开始完成 ====="
echo "您现在可以开始在开发环境中使用Vault了"

62
scripts/diagnose-consul-sync.sh Executable file
View File

@ -0,0 +1,62 @@
#!/bin/bash
# Consul 集群同步诊断脚本
echo "=== Consul 集群同步诊断 ==="
echo "时间: $(date)"
echo ""
CONSUL_NODES=(
"master.tailnet-68f9.ts.net:8500"
"warden.tailnet-68f9.ts.net:8500"
"ash3c.tailnet-68f9.ts.net:8500"
)
echo "1. 检查集群状态"
echo "=================="
for node in "${CONSUL_NODES[@]}"; do
echo "节点: $node"
echo " Leader: $(curl -s http://$node/v1/status/leader 2>/dev/null || echo 'ERROR')"
echo " Peers: $(curl -s http://$node/v1/status/peers 2>/dev/null | jq length 2>/dev/null || echo 'ERROR')"
echo ""
done
echo "2. 检查服务注册"
echo "================"
for node in "${CONSUL_NODES[@]}"; do
echo "节点: $node"
echo " Catalog 服务:"
curl -s http://$node/v1/catalog/services 2>/dev/null | jq -r 'keys[]' 2>/dev/null | grep -E "(consul-lb|traefik)" | sed 's/^/ /' || echo " ERROR 或无服务"
echo " Agent 服务:"
curl -s http://$node/v1/agent/services 2>/dev/null | jq -r 'keys[]' 2>/dev/null | grep -E "traefik" | sed 's/^/ /' || echo " 无本地服务"
echo ""
done
echo "3. 检查健康状态"
echo "================"
for node in "${CONSUL_NODES[@]}"; do
echo "节点: $node"
checks=$(curl -s http://$node/v1/agent/checks 2>/dev/null)
if [ $? -eq 0 ]; then
echo "$checks" | jq -r 'to_entries[] | select(.key | contains("traefik")) | " \(.key): \(.value.Status)"' 2>/dev/null || echo " 无 Traefik 健康检查"
else
echo " ERROR: 无法连接"
fi
echo ""
done
echo "4. 网络连通性测试"
echo "=================="
echo "测试从当前节点到 Traefik 的连接:"
curl -s -w " HTTP %{http_code} - 响应时间: %{time_total}s\n" -o /dev/null http://100.97.62.111:80/ || echo " ERROR: 无法连接到 Traefik"
curl -s -w " HTTP %{http_code} - 响应时间: %{time_total}s\n" -o /dev/null http://100.97.62.111:8080/api/overview || echo " ERROR: 无法连接到 Traefik Dashboard"
echo ""
echo "5. 建议操作"
echo "==========="
echo "如果发现问题:"
echo " 1. 重新注册服务: ./scripts/register-traefik-to-all-consul.sh"
echo " 2. 检查 Consul 日志: nomad alloc logs \$(nomad job allocs consul-cluster-nomad | grep warden | awk '{print \$1}') consul"
echo " 3. 重启有问题的 Consul 节点"
echo " 4. 检查网络连通性和防火墙设置"

View File

@ -1,87 +0,0 @@
#!/bin/bash
# 链接所有MCP配置文件的脚本
# 该脚本将所有IDE和AI助手的MCP配置链接到NFS共享的配置文件
NFS_CONFIG="/mnt/fnsync/mcp/mcp_shared_config.json"
echo "链接所有MCP配置文件到NFS共享配置..."
# 检查NFS配置文件是否存在
if [ ! -f "$NFS_CONFIG" ]; then
echo "错误: NFS配置文件不存在: $NFS_CONFIG"
exit 1
fi
echo "✓ 使用NFS共享配置作为基准: $NFS_CONFIG"
# 定义所有可能的MCP配置位置
CONFIGS=(
# Kilo Code IDE (全局配置,移除了项目级别配置以避免冲突)
"../.trae-server/data/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json"
# Tencent CodeBuddy
"$HOME/.codebuddy-server/data/User/globalStorage/tencent.planning-genie/settings/codebuddy_mcp_settings.json"
"$HOME/.codebuddy/data/User/globalStorage/tencent.planning-genie/settings/codebuddy_mcp_settings.json"
# 新增的CodeBuddy-CN
"$HOME/.codebuddy-server-cn/data/User/globalStorage/tencent.planning-genie/settings/codebuddy_mcp_settings.json"
# Claude相关
"$HOME/.claude.json"
"$HOME/.claude.json.backup"
"$HOME/.config/claude/settings/mcp_settings.json"
# Cursor
"$HOME/.cursor-server/data/User/globalStorage/xxx.cursor/settings/mcp_settings.json"
# Qoder
"$HOME/.qoder-server/data/User/globalStorage/xxx.qoder/settings/mcp_settings.json"
# Cline
"$HOME/.codebuddy-server/data/User/globalStorage/rooveterinaryinc.roo-cline/settings/mcp_settings.json"
"$HOME/Cline/settings/mcp_settings.json"
# Kiro
"$HOME/.kiro-server/data/User/globalStorage/xxx.kiro/settings/mcp_settings.json"
# Qwen
"$HOME/.qwen/settings/mcp_settings.json"
# VSCodium
"$HOME/.vscodium-server/data/User/globalStorage/xxx.vscodium/settings/mcp_settings.json"
# Other potential locations
".kilocode/mcp.json"
"$HOME/.config/Qoder/SharedClientCache/mcp.json"
"$HOME/.trae-server/data/Machine/mcp.json"
"$HOME/.trae-cn-server/data/Machine/mcp.json"
"$HOME/.codegeex/agent/configs/user_mcp_config.json"
"$HOME/.codegeex/agent/configs/mcp_config.json"
)
# 链接到每个配置位置
for config_path in "${CONFIGS[@]}"; do
if [ -n "$config_path" ]; then
config_dir=$(dirname "$config_path")
if [ -d "$config_dir" ]; then
# 如果目标文件已存在,先备份
if [ -f "$config_path" ]; then
mv "$config_path" "${config_path}.backup"
echo "✓ 原配置文件已备份: ${config_path}.backup"
fi
# 创建符号链接
ln -s "$NFS_CONFIG" "$config_path" 2>/dev/null
if [ $? -eq 0 ]; then
echo "✓ 已创建链接到: $config_path"
else
echo "✗ 创建链接失败: $config_path"
fi
else
echo "✗ 目录不存在: $config_dir"
fi
fi
done
echo "所有MCP配置链接完成"
echo "所有IDE和AI助手现在都使用NFS共享的MCP配置文件: $NFS_CONFIG"

View File

@ -1,380 +0,0 @@
#!/usr/bin/env python3
"""
Qdrant MCP 服务器
此脚本实现了一个 MCP 服务器 Qdrant 向量数据库集成
"""
import asyncio
import json
import os
import sys
from typing import Any, Dict, List, Optional
import logging
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter
# 设置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class QdrantMCPServer:
def __init__(self):
# 从环境变量获取配置
self.qdrant_url = os.getenv("QDRANT_URL", "http://localhost:6333")
self.qdrant_api_key = os.getenv("QDRANT_API_KEY", "")
self.collection_name = os.getenv("COLLECTION_NAME", "mcp")
self.embedding_model = os.getenv("EMBEDDING_MODEL", "bge-m3")
# 初始化 Qdrant 客户端
self.client = QdrantClient(
url=self.qdrant_url,
api_key=self.qdrant_api_key if self.qdrant_api_key else None
)
# 确保集合存在
self._ensure_collection_exists()
logger.info(f"Qdrant MCP 服务器已初始化")
logger.info(f"Qdrant URL: {self.qdrant_url}")
logger.info(f"集合名称: {self.collection_name}")
logger.info(f"嵌入模型: {self.embedding_model}")
def _ensure_collection_exists(self):
"""确保集合存在,如果不存在则创建"""
try:
collections = self.client.get_collections().collections
collection_names = [collection.name for collection in collections]
if self.collection_name not in collection_names:
# 创建新集合
self.client.create_collection(
collection_name=self.collection_name,
vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
)
logger.info(f"已创建新集合: {self.collection_name}")
else:
logger.info(f"集合已存在: {self.collection_name}")
except Exception as e:
logger.error(f"确保集合存在时出错: {e}")
raise
async def handle_request(self, request: Dict[str, Any]) -> Dict[str, Any]:
"""处理 MCP 请求"""
method = request.get("method")
params = request.get("params", {})
request_id = request.get("id")
logger.info(f"收到请求: {method}")
try:
if method == "initialize":
result = await self.initialize(params)
elif method == "tools/list":
result = await self.list_tools(params)
elif method == "tools/call":
result = await self.call_tool(params)
elif method == "resources/list":
result = await self.list_resources(params)
elif method == "resources/read":
result = await self.read_resource(params)
else:
result = {
"error": {
"code": -32601,
"message": f"未知方法: {method}"
}
}
except Exception as e:
logger.error(f"处理请求时出错: {e}")
result = {
"error": {
"code": -32603,
"message": f"内部错误: {str(e)}"
}
}
response = {
"jsonrpc": "2.0",
"id": request_id,
**result
}
return response
async def initialize(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""初始化 MCP 服务器"""
logger.info("初始化 Qdrant MCP 服务器")
return {
"result": {
"protocolVersion": "2024-11-05",
"capabilities": {
"tools": {
"listChanged": False
},
"resources": {
"subscribe": False,
"listChanged": False
}
},
"serverInfo": {
"name": "qdrant-mcp-server",
"version": "1.0.0"
}
}
}
async def list_tools(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""列出可用工具"""
return {
"result": {
"tools": [
{
"name": "qdrant_search",
"description": "在 Qdrant 中搜索相似向量",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索查询文本"
},
"limit": {
"type": "integer",
"default": 5,
"description": "返回结果数量限制"
}
},
"required": ["query"]
}
},
{
"name": "qdrant_add",
"description": "向 Qdrant 添加向量",
"inputSchema": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "要添加的文本内容"
},
"metadata": {
"type": "object",
"description": "与文本关联的元数据"
}
},
"required": ["text"]
}
},
{
"name": "qdrant_delete",
"description": "从 Qdrant 删除向量",
"inputSchema": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "要删除的向量ID"
}
},
"required": ["id"]
}
}
]
}
}
async def call_tool(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""调用工具"""
name = params.get("name")
arguments = params.get("arguments", {})
if name == "qdrant_search":
return await self._search_vectors(arguments)
elif name == "qdrant_add":
return await self._add_vector(arguments)
elif name == "qdrant_delete":
return await self._delete_vector(arguments)
else:
return {
"error": {
"code": -32601,
"message": f"未知工具: {name}"
}
}
async def _search_vectors(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""搜索相似向量"""
query = params.get("query", "")
limit = params.get("limit", 5)
# 这里应该使用嵌入模型将查询转换为向量
# 由于我们没有实际的嵌入模型,这里使用一个简单的模拟
query_vector = [0.1] * 1024 # 模拟向量
try:
search_result = self.client.search(
collection_name=self.collection_name,
query_vector=query_vector,
limit=limit
)
results = []
for hit in search_result:
results.append({
"id": hit.id,
"score": hit.score,
"payload": hit.payload
})
return {
"result": {
"content": [
{
"type": "text",
"text": f"搜索结果: {json.dumps(results, ensure_ascii=False)}"
}
]
}
}
except Exception as e:
logger.error(f"搜索向量时出错: {e}")
return {
"error": {
"code": -32603,
"message": f"搜索向量时出错: {str(e)}"
}
}
async def _add_vector(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""添加向量"""
text = params.get("text", "")
metadata = params.get("metadata", {})
# 生成一个简单的ID
import hashlib
vector_id = hashlib.md5(text.encode()).hexdigest()
# 这里应该使用嵌入模型将文本转换为向量
# 由于我们没有实际的嵌入模型,这里使用一个简单的模拟
vector = [0.1] * 1024 # 模拟向量
try:
self.client.upsert(
collection_name=self.collection_name,
points=[
PointStruct(
id=vector_id,
vector=vector,
payload={
"text": text,
**metadata
}
)
]
)
return {
"result": {
"content": [
{
"type": "text",
"text": f"已添加向量ID: {vector_id}"
}
]
}
}
except Exception as e:
logger.error(f"添加向量时出错: {e}")
return {
"error": {
"code": -32603,
"message": f"添加向量时出错: {str(e)}"
}
}
async def _delete_vector(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""删除向量"""
vector_id = params.get("id", "")
try:
self.client.delete(
collection_name=self.collection_name,
points_selector=[vector_id]
)
return {
"result": {
"content": [
{
"type": "text",
"text": f"已删除向量ID: {vector_id}"
}
]
}
}
except Exception as e:
logger.error(f"删除向量时出错: {e}")
return {
"error": {
"code": -32603,
"message": f"删除向量时出错: {str(e)}"
}
}
async def list_resources(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""列出资源"""
return {
"result": {
"resources": []
}
}
async def read_resource(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""读取资源"""
return {
"error": {
"code": -32601,
"message": "不支持读取资源"
}
}
async def main():
"""主函数"""
server = QdrantMCPServer()
# 从标准输入读取请求
for line in sys.stdin:
try:
request = json.loads(line)
response = await server.handle_request(request)
print(json.dumps(response, ensure_ascii=False))
sys.stdout.flush()
except json.JSONDecodeError as e:
logger.error(f"解析 JSON 时出错: {e}")
error_response = {
"jsonrpc": "2.0",
"id": None,
"error": {
"code": -32700,
"message": f"解析 JSON 时出错: {str(e)}"
}
}
print(json.dumps(error_response, ensure_ascii=False))
sys.stdout.flush()
except Exception as e:
logger.error(f"处理请求时出错: {e}")
error_response = {
"jsonrpc": "2.0",
"id": None,
"error": {
"code": -32603,
"message": f"内部错误: {str(e)}"
}
}
print(json.dumps(error_response, ensure_ascii=False))
sys.stdout.flush()
if __name__ == "__main__":
asyncio.run(main())

View File

@ -1,117 +0,0 @@
#!/usr/bin/env python3
"""
Qdrant Ollama 嵌入模型集成示例
此脚本演示如何使用 Ollama 作为嵌入模型提供者与 Qdrant 向量数据库集成
"""
from langchain_ollama import OllamaEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
import os
def main():
# 1. 初始化 Ollama 嵌入模型
# 使用 nomic-embed-text 模型,这是 Ollama 推荐的嵌入模型
print("初始化 Ollama 嵌入模型...")
embeddings = OllamaEmbeddings(
model="nomic-embed-text",
base_url="http://localhost:11434" # Ollama 默认地址
)
# 2. 初始化 Qdrant 客户端
print("连接到 Qdrant 数据库...")
client = QdrantClient(
url="http://localhost:6333", # Qdrant 默认地址
api_key="313131" # 从之前查看的配置中获取的 API 密钥
)
# 3. 创建集合(如果不存在)
collection_name = "ollama_integration_test"
print(f"创建或检查集合: {collection_name}")
# 首先检查集合是否已存在
collections = client.get_collections().collections
collection_exists = any(collection.name == collection_name for collection in collections)
if not collection_exists:
# 创建新集合
# 首先获取嵌入模型的维度
sample_embedding = embeddings.embed_query("sample text")
vector_size = len(sample_embedding)
client.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(
size=vector_size,
distance=Distance.COSINE
)
)
print(f"已创建新集合,向量维度: {vector_size}")
else:
print("集合已存在")
# 4. 准备示例数据
documents = [
"Qdrant 是一个高性能的向量搜索引擎",
"Ollama 是一个本地运行大语言模型的工具",
"向量数据库用于存储和检索高维向量",
"嵌入模型将文本转换为数值向量表示"
]
metadata = [
{"source": "qdrant_docs", "category": "database"},
{"source": "ollama_docs", "category": "llm"},
{"source": "vector_db_docs", "category": "database"},
{"source": "embedding_docs", "category": "ml"}
]
# 5. 使用 Ollama 生成嵌入并存储到 Qdrant
print("生成嵌入并存储到 Qdrant...")
points = []
for idx, (doc, meta) in enumerate(zip(documents, metadata)):
# 使用 Ollama 生成嵌入
embedding = embeddings.embed_query(doc)
# 创建 Qdrant 点
point = PointStruct(
id=idx,
vector=embedding,
payload={
"text": doc,
"metadata": meta
}
)
points.append(point)
# 上传点到 Qdrant
client.upsert(
collection_name=collection_name,
points=points
)
print(f"已上传 {len(points)} 个文档到 Qdrant")
# 6. 执行相似性搜索
query = "什么是向量数据库?"
print(f"\n执行搜索查询: '{query}'")
# 使用 Ollama 生成查询嵌入
query_embedding = embeddings.embed_query(query)
# 在 Qdrant 中搜索
search_result = client.search(
collection_name=collection_name,
query_vector=query_embedding,
limit=2
)
# 7. 显示搜索结果
print("\n搜索结果:")
for i, hit in enumerate(search_result, 1):
print(f"{i}. {hit.payload['text']} (得分: {hit.score:.4f})")
print(f" 元数据: {hit.payload['metadata']}")
print("\n集成测试完成!")
if __name__ == "__main__":
main()

View File

@ -1,357 +0,0 @@
#!/usr/bin/env python3
"""
Qdrant Ollama 嵌入模型集成的 MCP 服务器
此脚本实现了一个 MCP 服务器使用 Ollama 作为嵌入模型提供者与 Qdrant 向量数据库集成
"""
import asyncio
import json
import os
import sys
from typing import Any, Dict, List, Optional
import logging
from langchain_ollama import OllamaEmbeddings
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter
# 设置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class QdrantOllamaMCPServer:
def __init__(self):
# 在初始化之前打印环境变量
print(f"环境变量:")
print(f"QDRANT_URL: {os.getenv('QDRANT_URL', '未设置')}")
print(f"QDRANT_API_KEY: {os.getenv('QDRANT_API_KEY', '未设置')}")
print(f"OLLAMA_URL: {os.getenv('OLLAMA_URL', '未设置')}")
print(f"OLLAMA_MODEL: {os.getenv('OLLAMA_MODEL', '未设置')}")
print(f"COLLECTION_NAME: {os.getenv('COLLECTION_NAME', '未设置')}")
# 从环境变量获取配置
self.qdrant_url = os.getenv("QDRANT_URL", "http://dev1:6333") # dev1服务器上的Qdrant地址
self.qdrant_api_key = os.getenv("QDRANT_API_KEY", "313131")
self.collection_name = os.getenv("COLLECTION_NAME", "ollama_mcp")
self.ollama_model = os.getenv("OLLAMA_MODEL", "nomic-embed-text")
self.ollama_url = os.getenv("OLLAMA_URL", "http://dev1:11434") # dev1服务器上的Ollama地址
# 初始化客户端
self.embeddings = OllamaEmbeddings(
model=self.ollama_model,
base_url=self.ollama_url
)
self.client = QdrantClient(
url=self.qdrant_url,
api_key=self.qdrant_api_key
)
# 确保集合存在
self._ensure_collection_exists()
logger.info(f"初始化完成,使用集合: {self.collection_name}")
def _ensure_collection_exists(self):
"""确保集合存在,如果不存在则创建"""
collections = self.client.get_collections().collections
collection_exists = any(collection.name == self.collection_name for collection in collections)
if not collection_exists:
# 获取嵌入模型的维度
sample_embedding = self.embeddings.embed_query("sample text")
vector_size = len(sample_embedding)
self.client.create_collection(
collection_name=self.collection_name,
vectors_config=VectorParams(
size=vector_size,
distance=Distance.COSINE
)
)
logger.info(f"已创建新集合,向量维度: {vector_size}")
else:
logger.info("集合已存在")
async def handle_request(self, request: Dict[str, Any]) -> Dict[str, Any]:
"""处理 MCP 请求"""
method = request.get("method")
params = request.get("params", {})
request_id = request.get("id")
logger.info(f"处理请求: {method}")
try:
if method == "initialize":
result = {
"protocolVersion": "2024-11-05",
"capabilities": {
"tools": {
"listChanged": True
},
"resources": {
"subscribe": True,
"listChanged": True
}
},
"serverInfo": {
"name": "qdrant-ollama-mcp-server",
"version": "1.0.0"
}
}
elif method == "tools/list":
result = {
"tools": [
{
"name": "add_document",
"description": "添加文档到向量数据库",
"inputSchema": {
"type": "object",
"properties": {
"text": {
"type": "string",
"description": "文档文本内容"
},
"metadata": {
"type": "object",
"description": "文档的元数据"
}
},
"required": ["text"]
}
},
{
"name": "search_documents",
"description": "在向量数据库中搜索相似文档",
"inputSchema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "搜索查询文本"
},
"limit": {
"type": "integer",
"description": "返回结果数量限制",
"default": 5
},
"filter": {
"type": "object",
"description": "搜索过滤器"
}
},
"required": ["query"]
}
},
{
"name": "list_collections",
"description": "列出所有集合",
"inputSchema": {
"type": "object",
"properties": {}
}
},
{
"name": "get_collection_info",
"description": "获取集合信息",
"inputSchema": {
"type": "object",
"properties": {
"collection_name": {
"type": "string",
"description": "集合名称"
}
},
"required": ["collection_name"]
}
}
]
}
elif method == "tools/call":
tool_name = params.get("name")
tool_params = params.get("arguments", {})
if tool_name == "add_document":
result = await self._add_document(tool_params)
elif tool_name == "search_documents":
result = await self._search_documents(tool_params)
elif tool_name == "list_collections":
result = await self._list_collections(tool_params)
elif tool_name == "get_collection_info":
result = await self._get_collection_info(tool_params)
else:
raise ValueError(f"未知工具: {tool_name}")
else:
raise ValueError(f"未知方法: {method}")
response = {
"jsonrpc": "2.0",
"id": request_id,
"result": result
}
except Exception as e:
logger.error(f"处理请求时出错: {e}")
response = {
"jsonrpc": "2.0",
"id": request_id,
"error": {
"code": -1,
"message": str(e)
}
}
return response
async def _add_document(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""添加文档到向量数据库"""
text = params.get("text")
metadata = params.get("metadata", {})
if not text:
raise ValueError("文档文本不能为空")
# 生成嵌入
embedding = self.embeddings.embed_query(text)
# 创建点
point = PointStruct(
id=hash(text) % (2 ** 31), # 使用文本哈希作为ID
vector=embedding,
payload={
"text": text,
"metadata": metadata
}
)
# 上传到 Qdrant
self.client.upsert(
collection_name=self.collection_name,
points=[point]
)
return {"success": True, "message": "文档已添加"}
async def _search_documents(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""在向量数据库中搜索相似文档"""
query = params.get("query")
limit = params.get("limit", 5)
filter_dict = params.get("filter")
if not query:
raise ValueError("搜索查询不能为空")
# 生成查询嵌入
query_embedding = self.embeddings.embed_query(query)
# 构建过滤器
search_filter = None
if filter_dict:
search_filter = Filter(**filter_dict)
# 执行搜索
search_result = self.client.search(
collection_name=self.collection_name,
query_vector=query_embedding,
limit=limit,
query_filter=search_filter
)
# 格式化结果
results = []
for hit in search_result:
results.append({
"text": hit.payload.get("text", ""),
"metadata": hit.payload.get("metadata", {}),
"score": hit.score
})
return {"results": results}
async def _list_collections(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""列出所有集合"""
collections = self.client.get_collections().collections
return {
"collections": [
{"name": collection.name} for collection in collections
]
}
async def _get_collection_info(self, params: Dict[str, Any]) -> Dict[str, Any]:
"""获取集合信息"""
collection_name = params.get("collection_name")
if not collection_name:
raise ValueError("集合名称不能为空")
try:
collection_info = self.client.get_collection(collection_name)
return {
"name": collection_name,
"vectors_count": collection_info.points_count,
"vectors_config": collection_info.config.params.vectors.dict()
}
except Exception as e:
raise ValueError(f"获取集合信息失败: {str(e)}")
async def run(self):
"""运行 MCP 服务器"""
logger.info("启动 Qdrant-Ollama MCP 服务器")
logger.info(f"Qdrant URL: {self.qdrant_url}")
logger.info(f"Ollama URL: {self.ollama_url}")
logger.info(f"Collection: {self.collection_name}")
# 从标准输入读取请求
while True:
try:
line = await asyncio.get_event_loop().run_in_executor(
None, sys.stdin.readline
)
if not line:
break
logger.info(f"收到请求: {line.strip()}")
# 解析 JSON 请求
request = json.loads(line.strip())
# 处理请求
response = await self.handle_request(request)
# 发送响应
response_json = json.dumps(response)
print(response_json, flush=True)
logger.info(f"发送响应: {response_json}")
except json.JSONDecodeError as e:
logger.error(f"JSON 解析错误: {e}")
except Exception as e:
logger.error(f"处理请求时出错: {e}")
except KeyboardInterrupt:
logger.info("服务器被中断")
break
async def main():
"""主函数"""
# 设置日志级别
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# 打印环境变量
print(f"环境变量:")
print(f"QDRANT_URL: {os.getenv('QDRANT_URL', '未设置')}")
print(f"QDRANT_API_KEY: {os.getenv('QDRANT_API_KEY', '未设置')}")
print(f"OLLAMA_URL: {os.getenv('OLLAMA_URL', '未设置')}")
print(f"OLLAMA_MODEL: {os.getenv('OLLAMA_MODEL', '未设置')}")
print(f"COLLECTION_NAME: {os.getenv('COLLECTION_NAME', '未设置')}")
# 创建服务器实例
server = QdrantOllamaMCPServer()
# 运行服务器
await server.run()
if __name__ == "__main__":
asyncio.run(main())

View File

@ -1,10 +0,0 @@
#!/bin/bash
# 设置环境变量
export QDRANT_URL=http://dev1:6333
export QDRANT_API_KEY=313131
export OLLAMA_URL=http://dev1:11434
export OLLAMA_MODEL=nomic-embed-text
export COLLECTION_NAME=ollama_mcp
# 启动MCP服务器
python /home/ben/qdrant/qdrant_ollama_mcp_server.py

View File

@ -0,0 +1,68 @@
#!/bin/bash
# 向所有三个 Consul 节点注册 Traefik 服务
# 解决 Consul leader 轮换问题
CONSUL_NODES=(
"master.tailnet-68f9.ts.net:8500"
"warden.tailnet-68f9.ts.net:8500"
"ash3c.tailnet-68f9.ts.net:8500"
)
TRAEFIK_IP="100.97.62.111"
ALLOC_ID=$(nomad job allocs traefik-consul-lb | head -2 | tail -1 | awk '{print $1}')
SERVICE_DATA_LB="{
\"ID\": \"traefik-consul-lb-${ALLOC_ID}\",
\"Name\": \"consul-lb\",
\"Tags\": [\"consul\", \"loadbalancer\", \"traefik\", \"multi-node\"],
\"Address\": \"${TRAEFIK_IP}\",
\"Port\": 80,
\"Check\": {
\"HTTP\": \"http://${TRAEFIK_IP}:80/\",
\"Interval\": \"30s\",
\"Timeout\": \"15s\"
}
}"
SERVICE_DATA_DASHBOARD="{
\"ID\": \"traefik-dashboard-${ALLOC_ID}\",
\"Name\": \"traefik-dashboard\",
\"Tags\": [\"traefik\", \"dashboard\", \"multi-node\"],
\"Address\": \"${TRAEFIK_IP}\",
\"Port\": 8080,
\"Check\": {
\"HTTP\": \"http://${TRAEFIK_IP}:8080/api/overview\",
\"Interval\": \"30s\",
\"Timeout\": \"15s\"
}
}"
echo "Registering Traefik services to all Consul nodes..."
echo "Allocation ID: ${ALLOC_ID}"
echo "Traefik IP: ${TRAEFIK_IP}"
for node in "${CONSUL_NODES[@]}"; do
echo "Registering to ${node}..."
# 注册 consul-lb 服务
curl -s -X PUT "http://${node}/v1/agent/service/register" \
-H "Content-Type: application/json" \
-d "${SERVICE_DATA_LB}"
# 注册 traefik-dashboard 服务
curl -s -X PUT "http://${node}/v1/agent/service/register" \
-H "Content-Type: application/json" \
-d "${SERVICE_DATA_DASHBOARD}"
echo "✓ Registered to ${node}"
done
echo ""
echo "🎉 Services registered to all Consul nodes!"
echo ""
echo "Verification:"
for node in "${CONSUL_NODES[@]}"; do
echo "Services on ${node}:"
curl -s "http://${node}/v1/catalog/services" | jq -r 'keys[]' | grep -E "(consul-lb|traefik-dashboard)" | sed 's/^/ - /'
done

View File

@ -1,61 +0,0 @@
#!/bin/bash
# Consul配置生成脚本
# 此脚本使用Consul模板从KV存储生成最终的Consul配置文件
set -e
# 配置参数
CONSUL_ADDR="${CONSUL_ADDR:-localhost:8500}"
ENVIRONMENT="${ENVIRONMENT:-dev}"
CONSUL_CONFIG_DIR="${CONSUL_CONFIG_DIR:-/root/mgmt/components/consul/configs}"
CONSUL_TEMPLATE_CMD="${CONSUL_TEMPLATE_CMD:-consul-template}"
echo "开始生成Consul配置文件..."
echo "Consul地址: $CONSUL_ADDR"
echo "环境: $ENVIRONMENT"
echo "配置目录: $CONSUL_CONFIG_DIR"
# 检查Consul连接
echo "检查Consul连接..."
if ! curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then
echo "错误: 无法连接到Consul服务器 $CONSUL_ADDR"
exit 1
fi
echo "Consul连接成功"
# 检查consul-template是否可用
if ! command -v $CONSUL_TEMPLATE_CMD &> /dev/null; then
echo "错误: consul-template 命令不可用请安装consul-template"
exit 1
fi
# 设置环境变量
export CONSUL_ADDR
export ENVIRONMENT
# 使用consul-template生成配置文件
echo "使用consul-template生成配置文件..."
$CONSUL_TEMPLATE_CMD \
-template="$CONSUL_CONFIG_DIR/consul.hcl.tmpl:$CONSUL_CONFIG_DIR/consul.hcl" \
-once \
-consul-addr="$CONSUL_ADDR"
# 验证生成的配置文件
if [ -f "$CONSUL_CONFIG_DIR/consul.hcl" ]; then
echo "配置文件生成成功: $CONSUL_CONFIG_DIR/consul.hcl"
# 验证配置文件语法
echo "验证配置文件语法..."
if consul validate $CONSUL_CONFIG_DIR/consul.hcl; then
echo "配置文件语法验证通过"
else
echo "错误: 配置文件语法验证失败"
exit 1
fi
else
echo "错误: 配置文件生成失败"
exit 1
fi
echo "Consul配置文件生成完成"

View File

@ -1,104 +0,0 @@
#!/bin/bash
# Consul变量配置脚本 - 遵循最佳命名规范
# 此脚本将Consul集群配置存储到Consul KV中遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
set -e
# 配置参数
CONSUL_ADDR="${CONSUL_ADDR:-localhost:8500}"
ENVIRONMENT="${ENVIRONMENT:-dev}"
CONSUL_CONFIG_DIR="${CONSUL_CONFIG_DIR:-/root/mgmt/components/consul/configs}"
echo "开始配置Consul变量遵循最佳命名规范..."
echo "Consul地址: $CONSUL_ADDR"
echo "环境: $ENVIRONMENT"
# 检查Consul连接
echo "检查Consul连接..."
if ! curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then
echo "错误: 无法连接到Consul服务器 $CONSUL_ADDR"
exit 1
fi
echo "Consul连接成功"
# 创建Consul集群配置变量
echo "创建Consul集群配置变量..."
# 基础配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/data_dir" -d "/opt/consul/data"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/raft_dir" -d "/opt/consul/raft"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/datacenter" -d "dc1"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/bootstrap_expect" -d "3"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/log_level" -d "INFO"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/encrypt_key" -d "YourEncryptionKeyHere"
# UI配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ui/enabled" -d "true"
# 网络配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/network/client_addr" -d "0.0.0.0"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/network/bind_interface" -d "eth0"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/network/advertise_interface" -d "eth0"
# 端口配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/dns" -d "8600"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/http" -d "8500"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/https" -d "-1"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/grpc" -d "8502"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/grpc_tls" -d "8503"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/serf_lan" -d "8301"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/serf_wan" -d "8302"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/server" -d "8300"
# 节点配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/master/ip" -d "100.117.106.136"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/ash3c/ip" -d "100.116.80.94"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/warden/ip" -d "100.122.197.112"
# 服务发现配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/service/enable_script_checks" -d "true"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/service/enable_local_script_checks" -d "true"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/service/enable_service_script" -d "true"
# 性能配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/performance/raft_multiplier" -d "1"
# 日志配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/log/enable_syslog" -d "false"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/log/log_file" -d "/var/log/consul/consul.log"
# 连接配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/connection/reconnect_timeout" -d "30s"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/connection/reconnect_timeout_wan" -d "30s"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/connection/session_ttl_min" -d "10s"
# Autopilot配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/cleanup_dead_servers" -d "true"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/last_contact_threshold" -d "200ms"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/max_trailing_logs" -d "250"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/server_stabilization_time" -d "10s"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/disable_upgrade_migration" -d "false"
# 添加领导者优先级配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/redundancy_zone_tag_master" -d "vice_president"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/redundancy_zone_tag_warden" -d "president"
# 快照配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/enabled" -d "true"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/interval" -d "24h"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/retain" -d "30"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/name" -d "consul-snapshot-{{.Timestamp}}"
# 备份配置
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/enabled" -d "true"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/interval" -d "6h"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/retain" -d "7"
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/name" -d "consul-backup-{{.Timestamp}}"
echo "Consul变量配置完成"
# 验证配置
echo "验证配置..."
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/?keys" | jq -r '.[]' | head -10
echo "Consul变量配置脚本执行完成"

View File

@ -1,261 +0,0 @@
#!/bin/bash
# Consul 变量和存储配置脚本
# 用于增强Consul集群功能
set -e
# 颜色输出
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
# 日志函数
log_info() {
echo -e "${GREEN}[INFO]${NC} $1"
}
log_warn() {
echo -e "${YELLOW}[WARN]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# 默认Consul地址
CONSUL_ADDR=${CONSUL_ADDR:-"http://localhost:8500"}
# 检查Consul连接
check_consul() {
log_info "检查Consul连接..."
if curl -s "${CONSUL_ADDR}/v1/status/leader" > /dev/null; then
log_info "Consul连接正常"
return 0
else
log_error "无法连接到Consul: ${CONSUL_ADDR}"
return 1
fi
}
# 配置Consul变量
setup_variables() {
log_info "配置Consul变量..."
# 环境变量
ENVIRONMENT=${ENVIRONMENT:-"dev"}
# 创建基础配置结构
log_info "创建基础配置结构..."
# 应用配置
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/app/name" -d "my-application" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/app/version" -d "1.0.0" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/app/environment" -d "${ENVIRONMENT}" > /dev/null
# 数据库配置
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/database/host" -d "db.example.com" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/database/port" -d "5432" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/database/name" -d "myapp_db" > /dev/null
# 缓存配置
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/cache/host" -d "redis.example.com" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/cache/port" -d "6379" > /dev/null
# 消息队列配置
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/mq/host" -d "mq.example.com" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/mq/port" -d "5672" > /dev/null
# 特性开关
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/features/new_ui" -d "true" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/features/advanced_analytics" -d "false" > /dev/null
log_info "Consul变量配置完成"
}
# 配置Consul存储
setup_storage() {
log_info "配置Consul存储..."
# 创建存储配置
# 注意这些配置需要在Consul配置文件中启用相应的存储后端
# 持久化存储配置
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/data_dir" -d "/opt/consul/data" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/raft_dir" -d "/opt/consul/raft" > /dev/null
# 快照配置
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/snapshot_enabled" -d "true" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/snapshot_interval" -d "24h" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/snapshot_retention" -d "30" > /dev/null
# 备份配置
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/backup_enabled" -d "true" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/backup_interval" -d "6h" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/backup_retention" -d "7" > /dev/null
# 自动清理配置
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/cleanup_dead_servers" -d "true" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/last_contact_threshold" -d "200ms" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/max_trailing_logs" -d "250" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/server_stabilization_time" -d "10s" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/redundancy_zone_tag" -d "" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/disable_upgrade_migration" -d "false" > /dev/null
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/upgrade_version_tag" -d "" > /dev/null
log_info "Consul存储配置完成"
}
# 创建Consul配置文件
create_consul_config() {
log_info "创建Consul配置文件..."
# 创建配置目录
mkdir -p /root/mgmt/components/consul/configs
# 创建基础配置文件
cat > /root/mgmt/components/consul/configs/consul.hcl << EOF
# Consul 基础配置
data_dir = "/opt/consul/data"
raft_dir = "/opt/consul/raft"
# 启用UI
ui_config {
enabled = true
}
# 数据中心配置
datacenter = "dc1"
# 服务器配置
server = true
bootstrap_expect = 3
# 客户端地址
client_addr = "0.0.0.0"
# 绑定地址
bind_addr = "{{ GetInterfaceIP `eth0` }}"
# 广告地址
advertise_addr = "{{ GetInterfaceIP `eth0` }}"
# 端口配置
ports {
dns = 8600
http = 8500
https = -1
grpc = 8502
grpc_tls = 8503
serf_lan = 8301
serf_wan = 8302
server = 8300
}
# 连接其他节点
retry_join = ["100.117.106.136", "100.116.80.94", "100.122.197.112"]
# 启用服务发现
enable_service_script = true
# 启用脚本检查
enable_script_checks = true
# 启用本地脚本检查
enable_local_script_checks = true
# 性能调优
performance {
raft_multiplier = 1
}
# 日志配置
log_level = "INFO"
enable_syslog = false
log_file = "/var/log/consul/consul.log"
# 自动加密
encrypt = "YourEncryptionKeyHere"
# 重用端口
reconnect_timeout = "30s"
reconnect_timeout_wan = "30s"
# 会话TTL
session_ttl_min = "10s"
# 自动清理
autopilot {
cleanup_dead_servers = true
last_contact_threshold = "200ms"
max_trailing_logs = 250
server_stabilization_time = "10s"
redundancy_zone_tag = ""
disable_upgrade_migration = false
upgrade_version_tag = ""
}
# 快照配置
snapshot {
enabled = true
interval = "24h"
retain = 30
name = "consul-snapshot-{{.Timestamp}}"
}
# 备份配置
backup {
enabled = true
interval = "6h"
retain = 7
name = "consul-backup-{{.Timestamp}}"
}
EOF
log_info "Consul配置文件创建完成: /root/mgmt/components/consul/configs/consul.hcl"
}
# 显示配置
show_config() {
log_info "显示Consul变量配置..."
echo "=========================================="
curl -s "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT:-dev}/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"'
echo "=========================================="
log_info "显示Consul存储配置..."
echo "=========================================="
curl -s "${CONSUL_ADDR}/v1/kv/storage/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"'
echo "=========================================="
}
# 主函数
main() {
log_info "开始配置Consul变量和存储..."
# 检查Consul连接
check_consul
# 配置变量
setup_variables
# 配置存储
setup_storage
# 创建配置文件
create_consul_config
# 显示配置
show_config
log_info "Consul变量和存储配置完成"
# 提示下一步
log_info "下一步操作:"
log_info "1. 重启Consul服务以应用新配置"
log_info "2. 验证配置是否生效"
log_info "3. 根据需要调整配置参数"
}
# 执行主函数
main "$@"

View File

@ -1,149 +0,0 @@
#!/bin/bash
# 环境设置脚本
# 用于设置开发环境的必要组件和依赖
set -euo pipefail
# 颜色定义
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color
# 日志函数
log_info() {
echo -e "${BLUE}[INFO]${NC} $1"
}
log_success() {
echo -e "${GREEN}[SUCCESS]${NC} $1"
}
log_warning() {
echo -e "${YELLOW}[WARNING]${NC} $1"
}
log_error() {
echo -e "${RED}[ERROR]${NC} $1"
}
# 检查必要的工具
check_dependencies() {
log_info "检查系统依赖..."
local deps=("git" "curl" "wget" "jq" "docker" "podman")
local missing_deps=()
for dep in "${deps[@]}"; do
if ! command -v "$dep" &> /dev/null; then
missing_deps+=("$dep")
fi
done
if [ ${#missing_deps[@]} -ne 0 ]; then
log_warning "缺少以下依赖: ${missing_deps[*]}"
log_info "请安装缺少的依赖后重新运行"
return 1
fi
log_success "所有依赖检查通过"
}
# 设置环境变量
setup_environment_variables() {
log_info "设置环境变量..."
# 创建环境变量文件
cat > .env << EOF
# 项目环境变量
PROJECT_ROOT=$(pwd)
SCRIPTS_DIR=\${PROJECT_ROOT}/scripts
# Vault 配置
VAULT_ADDR=http://127.0.0.1:8200
VAULT_DEV_ROOT_TOKEN_ID=myroot
# Consul 配置
CONSUL_HTTP_ADDR=http://127.0.0.1:8500
# Nomad 配置
NOMAD_ADDR=http://127.0.0.1:4646
# MCP 配置
MCP_SERVER_PORT=3000
EOF
log_success "环境变量文件已创建: .env"
}
# 创建必要的目录
create_directories() {
log_info "创建必要的目录..."
local dirs=(
"logs"
"tmp"
"data"
"backups/vault"
"backups/consul"
"backups/nomad"
)
for dir in "${dirs[@]}"; do
mkdir -p "$dir"
log_info "创建目录: $dir"
done
log_success "目录创建完成"
}
# 设置脚本权限
setup_script_permissions() {
log_info "设置脚本执行权限..."
find scripts/ -name "*.sh" -exec chmod +x {} \;
log_success "脚本权限设置完成"
}
# 初始化 Git hooks如果需要
setup_git_hooks() {
log_info "设置 Git hooks..."
if [ -d ".git" ]; then
# 创建 pre-commit hook
cat > .git/hooks/pre-commit << 'EOF'
#!/bin/bash
# 运行基本的代码检查
echo "运行 pre-commit 检查..."
# 检查脚本语法
find scripts/ -name "*.sh" -exec bash -n {} \; || exit 1
echo "Pre-commit 检查通过"
EOF
chmod +x .git/hooks/pre-commit
log_success "Git hooks 设置完成"
else
log_warning "不是 Git 仓库,跳过 Git hooks 设置"
fi
}
# 主函数
main() {
log_info "开始环境设置..."
check_dependencies || exit 1
setup_environment_variables
create_directories
setup_script_permissions
setup_git_hooks
log_success "环境设置完成!"
log_info "请运行 'source .env' 来加载环境变量"
}
# 执行主函数
main "$@"

Some files were not shown because too many files have changed in this diff Show More