Clean up repository: remove backup files and reorganize infrastructure components
This commit is contained in:
parent
e5aa00d6f9
commit
1c994f9f60
|
|
@ -1 +0,0 @@
|
|||
/mnt/fnsync/mcp/mcp_shared_config.json
|
||||
|
|
@ -1,41 +0,0 @@
|
|||
# MCP 配置共享方案
|
||||
|
||||
本项目实现了跨主机多个IDE之间共享MCP(Model Context Protocol)配置的解决方案,使用NFS卷实现跨主机同步。
|
||||
|
||||
## 配置结构
|
||||
|
||||
- `/root/.mcp/mcp_settings.json` - 主MCP配置文件(符号链接指向NFS卷)
|
||||
- `/mnt/fnsync/mcp/mcp_shared_config.json` - NFS卷上的统一配置文件(权威源)
|
||||
- `mcp_shared_config.json` - 指向NFS卷上配置文件的符号链接
|
||||
- `sync_mcp_config.sh` - 同步脚本,用于将统一配置复制到各个IDE
|
||||
- `sync_all_mcp_configs.sh` - 完整同步脚本,同步到所有可能的IDE和AI助手
|
||||
- `.kilocode/mcp.json` - 指向共享配置的符号链接
|
||||
- 其他IDE和AI助手的配置文件
|
||||
|
||||
## 统一配置内容
|
||||
|
||||
合并了以下MCP服务器:
|
||||
|
||||
### 标准服务器
|
||||
- context7: 提供库文档和代码示例
|
||||
- filesystem: 文件系统访问
|
||||
- sequentialthinking: 顺序思考工具
|
||||
- git: Git 操作
|
||||
- time: 时间相关操作
|
||||
- memory: 知识图谱和记忆管理
|
||||
- tavily: 网络搜索功能
|
||||
|
||||
## 使用方法
|
||||
|
||||
1. **更新配置**: 编辑 `/mnt/fnsync/mcp/mcp_shared_config.json` 文件以修改MCP服务器配置(或通过符号链接 `/root/.mcp/mcp_settings.json`)
|
||||
2. **同步配置**:
|
||||
- 运行 `./sync_mcp_config.sh` 同步到特定IDE
|
||||
- 运行 `./sync_all_mcp_configs.sh` 同步到所有IDE和AI助手
|
||||
3. **验证配置**: 确认各IDE中的MCP功能正常工作
|
||||
|
||||
## 维护说明
|
||||
|
||||
- 所有MCP配置更改都应在 `/mnt/fnsync/mcp/mcp_shared_config.json` 中进行(这是权威源)
|
||||
- `/root/.mcp/mcp_settings.json` 现在是符号链接,指向NFS卷上的统一配置
|
||||
- 由于使用NFS卷,配置更改会自动跨主机共享
|
||||
- 如果添加新的IDE,可以将其配置文件链接到或复制自 `/mnt/fnsync/mcp/mcp_shared_config.json`
|
||||
572
README.md.backup
572
README.md.backup
|
|
@ -1,572 +0,0 @@
|
|||
# 🏗️ 基础设施管理项目
|
||||
|
||||
这是一个现代化的多云基础设施管理平台,专注于 OpenTofu、Ansible 和 Nomad + Podman 的集成管理。
|
||||
|
||||
## 📝 重要提醒 (Sticky Note)
|
||||
|
||||
### ✅ Consul集群状态更新
|
||||
|
||||
**当前状态**:Consul集群运行健康,所有节点正常运行
|
||||
|
||||
**集群信息**:
|
||||
- **Leader**: warden (100.122.197.112:8300)
|
||||
- **节点数量**: 3个服务器节点
|
||||
- **健康状态**: 所有节点健康检查通过
|
||||
- **节点列表**:
|
||||
- master (100.117.106.136) - 韩国主节点
|
||||
- ash3c (100.116.80.94) - 美国服务器节点
|
||||
- warden (100.122.197.112) - 北京服务器节点,当前集群leader
|
||||
|
||||
**配置状态**:
|
||||
- Ansible inventory配置与实际集群状态一致
|
||||
- 所有节点均为服务器模式
|
||||
- bootstrap_expect=3,符合实际节点数量
|
||||
|
||||
**依赖关系**:
|
||||
- Tailscale (第1天) ✅
|
||||
- Ansible (第2天) ✅
|
||||
- Nomad (第3天) ✅
|
||||
- Consul (第4天) ✅ **已完成**
|
||||
- Terraform (第5天) ✅ **进展良好**
|
||||
- Vault (第6天) ⏳ 计划中
|
||||
- Waypoint (第7天) ⏳ 计划中
|
||||
|
||||
**下一步计划**:
|
||||
- 继续推进Terraform状态管理
|
||||
- 准备Vault密钥管理集成
|
||||
- 规划Waypoint应用部署流程
|
||||
|
||||
---
|
||||
|
||||
## 🎯 项目特性
|
||||
|
||||
- **🌩️ 多云支持**: Oracle Cloud, 华为云, Google Cloud, AWS, DigitalOcean
|
||||
- **🏗️ 基础设施即代码**: 使用 OpenTofu 管理云资源
|
||||
- **⚙️ 配置管理**: 使用 Ansible 自动化配置和部署
|
||||
- **🐳 容器编排**: Nomad 集群管理和 Podman 容器运行时
|
||||
- **🔄 CI/CD**: Gitea Actions 自动化流水线
|
||||
- **📊 监控**: Prometheus + Grafana 监控体系
|
||||
- **🔐 安全**: 多层安全防护和合规性
|
||||
|
||||
## 🔄 架构分层与职责划分
|
||||
|
||||
### ⚠️ 重要:Terraform 与 Nomad 的职责区分
|
||||
|
||||
本项目采用分层架构,明确区分了不同工具的职责范围,避免混淆:
|
||||
|
||||
#### 1. **Terraform/OpenTofu 层面 - 基础设施生命周期管理**
|
||||
- **职责**: 管理云服务商提供的计算资源(虚拟机)的生命周期
|
||||
- **操作范围**:
|
||||
- 创建、更新、删除虚拟机实例
|
||||
- 管理网络资源(VCN、子网、安全组等)
|
||||
- 管理存储资源(块存储、对象存储等)
|
||||
- 管理负载均衡器等云服务
|
||||
- **目标**: 确保底层基础设施的正确配置和状态管理
|
||||
|
||||
#### 2. **Nomad 层面 - 应用资源调度与编排**
|
||||
- **职责**: 在已经运行起来的虚拟机内部进行资源分配和应用编排
|
||||
- **操作范围**:
|
||||
- 在现有虚拟机上调度和运行容器化应用
|
||||
- 管理应用的生命周期(启动、停止、更新)
|
||||
- 资源分配和限制(CPU、内存、存储)
|
||||
- 服务发现和负载均衡
|
||||
- **目标**: 在已有基础设施上高效运行应用服务
|
||||
|
||||
#### 3. **关键区别**
|
||||
- **Terraform** 关注的是**虚拟机本身**的生命周期管理
|
||||
- **Nomad** 关注的是**在虚拟机内部**运行的应用的资源调度
|
||||
- **Terraform** 决定"有哪些虚拟机"
|
||||
- **Nomad** 决定"虚拟机上运行什么应用"
|
||||
|
||||
#### 4. **工作流程示例**
|
||||
```
|
||||
1. Terraform 创建虚拟机 (云服务商层面)
|
||||
↓
|
||||
2. 虚拟机启动并运行操作系统
|
||||
↓
|
||||
3. 在虚拟机上安装和配置 Nomad 客户端
|
||||
↓
|
||||
4. Nomad 在虚拟机上调度和运行应用容器
|
||||
```
|
||||
|
||||
**重要提醒**: 这两个层面不可混淆,Terraform 不应该管理应用层面的资源,Nomad 也不应该创建虚拟机。严格遵守此分层架构是项目成功的关键。
|
||||
|
||||
## 📁 项目结构
|
||||
|
||||
```
|
||||
mgmt/
|
||||
├── .gitea/workflows/ # CI/CD 工作流
|
||||
├── tofu/ # OpenTofu 基础设施代码 (基础设施生命周期管理)
|
||||
│ ├── environments/ # 环境配置 (dev/staging/prod)
|
||||
│ ├── modules/ # 可复用模块
|
||||
│ ├── providers/ # 云服务商配置
|
||||
│ └── shared/ # 共享配置
|
||||
├── configuration/ # Ansible 配置管理
|
||||
│ ├── inventories/ # 主机清单
|
||||
│ ├── playbooks/ # 剧本
|
||||
│ ├── templates/ # 模板文件
|
||||
│ └── group_vars/ # 组变量
|
||||
├── jobs/ # Nomad 作业定义 (应用资源调度与编排)
|
||||
│ ├── consul/ # Consul 集群配置
|
||||
│ └── podman/ # Podman 相关作业
|
||||
├── configs/ # 配置文件
|
||||
│ ├── nomad-master.hcl # Nomad 主节点配置
|
||||
│ └── nomad-ash3c.hcl # Nomad 客户端配置
|
||||
├── docs/ # 文档
|
||||
├── security/ # 安全配置
|
||||
│ ├── certificates/ # 证书文件
|
||||
│ └── policies/ # 安全策略
|
||||
├── tests/ # 测试脚本和报告
|
||||
│ ├── mcp_servers/ # MCP服务器测试脚本
|
||||
│ ├── mcp_server_test_report.md # MCP服务器测试报告
|
||||
│ └── legacy/ # 旧的测试脚本
|
||||
├── tools/ # 工具和实用程序
|
||||
├── playbooks/ # 核心Ansible剧本
|
||||
└── Makefile # 项目管理命令
|
||||
```
|
||||
|
||||
**架构分层说明**:
|
||||
- **tofu/** 目录包含 Terraform/OpenTofu 代码,负责管理云服务商提供的计算资源生命周期
|
||||
- **jobs/** 目录包含 Nomad 作业定义,负责在已有虚拟机内部进行应用资源调度
|
||||
- 这两个目录严格分离,确保职责边界清晰
|
||||
|
||||
**注意:** 项目已从 Docker Swarm 迁移到 Nomad + Podman,原有的 swarm 目录已不再使用。所有中间过程脚本和测试文件已清理,保留核心配置文件以符合GitOps原则。
|
||||
|
||||
## 🔄 GitOps 原则
|
||||
|
||||
本项目遵循 GitOps 工作流,确保基础设施状态与 Git 仓库中的代码保持一致:
|
||||
|
||||
- **声明式配置**: 所有基础设施和应用程序配置都以声明式方式存储在 Git 中
|
||||
- **版本控制和审计**: 所有变更都通过 Git 提交,提供完整的变更历史和审计跟踪
|
||||
- **自动化同步**: 通过 CI/CD 流水线自动将 Git 中的变更应用到实际环境
|
||||
- **状态收敛**: 系统会持续监控实际状态,并自动修复任何与期望状态的偏差
|
||||
|
||||
### GitOps 工作流程
|
||||
|
||||
1. **声明期望状态**: 在 Git 中定义基础设施和应用程序的期望状态
|
||||
2. **提交变更**: 通过 Git 提交来应用变更
|
||||
3. **自动同步**: CI/CD 系统检测到变更并自动应用到环境
|
||||
4. **状态验证**: 系统验证实际状态与期望状态一致
|
||||
5. **监控和告警**: 持续监控状态并在出现偏差时发出告警
|
||||
|
||||
这种工作流确保了环境的一致性、可重复性和可靠性,同时提供了完整的变更历史和回滚能力。
|
||||
|
||||
## 🚀 快速开始
|
||||
|
||||
### 1. 环境准备
|
||||
|
||||
```bash
|
||||
# 克隆项目
|
||||
git clone <repository-url>
|
||||
cd mgmt
|
||||
|
||||
# 检查环境状态
|
||||
./mgmt.sh status
|
||||
|
||||
# 快速部署(适用于开发环境)
|
||||
./mgmt.sh deploy
|
||||
```
|
||||
|
||||
### 2. 配置云服务商
|
||||
|
||||
```bash
|
||||
# 复制配置模板
|
||||
cp tofu/environments/dev/terraform.tfvars.example tofu/environments/dev/terraform.tfvars
|
||||
|
||||
# 编辑配置文件,填入你的云服务商凭据
|
||||
vim tofu/environments/dev/terraform.tfvars
|
||||
```
|
||||
|
||||
### 3. 初始化基础设施
|
||||
|
||||
```bash
|
||||
# 初始化 OpenTofu
|
||||
./mgmt.sh tofu init
|
||||
|
||||
# 查看执行计划
|
||||
./mgmt.sh tofu plan
|
||||
|
||||
# 应用基础设施变更
|
||||
cd tofu/environments/dev && tofu apply
|
||||
```
|
||||
|
||||
### 4. 部署 Nomad 服务
|
||||
|
||||
```bash
|
||||
# 部署 Consul 集群
|
||||
nomad run /root/mgmt/jobs/consul/consul-cluster.nomad
|
||||
|
||||
# 查看 Nomad 任务
|
||||
nomad job status
|
||||
|
||||
# 查看节点状态
|
||||
nomad node status
|
||||
```
|
||||
|
||||
### ⚠️ 重要提示:网络访问注意事项
|
||||
|
||||
**Tailscale 网络访问**:
|
||||
- 本项目中的 Nomad 和 Consul 服务通过 Tailscale 网络进行访问
|
||||
- 访问 Nomad (端口 4646) 和 Consul (端口 8500) 时,必须使用 Tailscale 分配的 IP 地址
|
||||
- 错误示例:`http://127.0.0.1:4646` 或 `http://localhost:8500` (无法连接)
|
||||
- 正确示例:`http://100.x.x.x:4646` 或 `http://100.x.x.x:8500` (使用 Tailscale IP)
|
||||
|
||||
**获取 Tailscale IP**:
|
||||
```bash
|
||||
# 查看当前节点的 Tailscale IP
|
||||
tailscale ip -4
|
||||
|
||||
# 查看所有 Tailscale 网络中的节点
|
||||
tailscale status
|
||||
```
|
||||
|
||||
**常见问题**:
|
||||
- 如果遇到 "connection refused" 错误,请确认是否使用了正确的 Tailscale IP
|
||||
- 确保 Tailscale 服务已启动并正常运行
|
||||
- 检查网络策略是否允许通过 Tailscale 接口访问相关端口
|
||||
- 更多详细经验和解决方案,请参考:[Consul 和 Nomad 访问问题经验教训](.gitea/issues/consul-nomad-access-lesson.md)
|
||||
|
||||
### 🔄 Nomad 集群领导者轮换与访问策略
|
||||
|
||||
**Nomad 集群领导者机制**:
|
||||
- Nomad 使用 Raft 协议实现分布式一致性,集群中只有一个领导者节点
|
||||
- 领导者节点负责处理所有写入操作和协调集群状态
|
||||
- 当领导者节点故障时,集群会自动选举新的领导者
|
||||
|
||||
**领导者轮换时的访问策略**:
|
||||
|
||||
1. **动态发现领导者**:
|
||||
```bash
|
||||
# 查询当前领导者节点
|
||||
curl -s http://<任意Nomad服务器IP>:4646/v1/status/leader
|
||||
# 返回结果示例: "100.90.159.68:4647"
|
||||
|
||||
# 使用返回的领导者地址进行API调用
|
||||
curl -s http://100.90.159.68:4646/v1/nodes
|
||||
```
|
||||
|
||||
2. **负载均衡方案**:
|
||||
- **DNS 负载均衡**:使用 Consul DNS 服务,通过 `nomad.service.consul` 解析到当前领导者
|
||||
- **代理层负载均衡**:在 Nginx/HAProxy 配置中添加健康检查,自动路由到活跃的领导者节点
|
||||
- **客户端重试机制**:在客户端代码中实现重试逻辑,当连接失败时尝试其他服务器节点
|
||||
|
||||
3. **推荐访问模式**:
|
||||
```bash
|
||||
# 使用领导者发现脚本
|
||||
#!/bin/bash
|
||||
# 获取任意一个Nomad服务器IP
|
||||
SERVER_IP="100.116.158.95"
|
||||
# 查询当前领导者
|
||||
LEADER=$(curl -s http://${SERVER_IP}:4646/v1/status/leader | sed 's/"//g')
|
||||
# 使用领导者地址执行命令
|
||||
nomad node status -address=http://${LEADER}
|
||||
```
|
||||
|
||||
4. **高可用性配置**:
|
||||
- 将所有 Nomad 服务器节点添加到客户端配置中
|
||||
- 客户端会自动连接到可用的服务器节点
|
||||
- 对于写入操作,客户端会自动重定向到领导者节点
|
||||
|
||||
**注意事项**:
|
||||
- Nomad 集群领导者轮换是自动进行的,通常不需要人工干预
|
||||
- 在领导者选举期间,集群可能会短暂无法处理写入操作
|
||||
- 建议在应用程序中实现适当的重试逻辑,以处理领导者切换期间的临时故障
|
||||
|
||||
## 🛠️ 常用命令
|
||||
|
||||
| 命令 | 描述 |
|
||||
|------|------|
|
||||
| `make status` | 显示项目状态总览 |
|
||||
| `make deploy` | 快速部署所有服务 |
|
||||
| `make cleanup` | 清理所有部署的服务 |
|
||||
| `cd tofu/environments/dev && tofu <cmd>` | OpenTofu 管理命令 |
|
||||
| `nomad job status` | 查看 Nomad 任务状态 |
|
||||
| `nomad node status` | 查看 Nomad 节点状态 |
|
||||
| `podman ps` | 查看运行中的容器 |
|
||||
| `ansible-playbook playbooks/configure-nomad-clients.yml` | 配置 Nomad 客户端 |
|
||||
| `./run_tests.sh` 或 `make test-mcp` | 运行所有MCP服务器测试 |
|
||||
| `make test-kali` | 运行Kali Linux快速健康检查 |
|
||||
| `make test-kali-security` | 运行Kali Linux安全工具测试 |
|
||||
| `make test-kali-full` | 运行Kali Linux完整测试套件 |
|
||||
|
||||
## 🌩️ 支持的云服务商
|
||||
|
||||
### Oracle Cloud Infrastructure (OCI)
|
||||
- ✅ 计算实例
|
||||
- ✅ 网络配置 (VCN, 子网, 安全组)
|
||||
- ✅ 存储 (块存储, 对象存储)
|
||||
- ✅ 负载均衡器
|
||||
|
||||
### 华为云
|
||||
- ✅ 弹性云服务器 (ECS)
|
||||
- ✅ 虚拟私有云 (VPC)
|
||||
- ✅ 弹性负载均衡 (ELB)
|
||||
- ✅ 云硬盘 (EVS)
|
||||
|
||||
### Google Cloud Platform
|
||||
- ✅ Compute Engine
|
||||
- ✅ VPC 网络
|
||||
- ✅ Cloud Load Balancing
|
||||
- ✅ Persistent Disk
|
||||
|
||||
### Amazon Web Services
|
||||
- ✅ EC2 实例
|
||||
- ✅ VPC 网络
|
||||
- ✅ Application Load Balancer
|
||||
- ✅ EBS 存储
|
||||
|
||||
### DigitalOcean
|
||||
- ✅ Droplets
|
||||
- ✅ VPC 网络
|
||||
- ✅ Load Balancers
|
||||
- ✅ Block Storage
|
||||
|
||||
## 🔄 CI/CD 流程
|
||||
|
||||
### 基础设施部署流程
|
||||
1. **代码提交** → 触发 Gitea Actions
|
||||
2. **OpenTofu Plan** → 生成执行计划
|
||||
3. **人工审核** → 确认变更
|
||||
4. **OpenTofu Apply** → 应用基础设施变更
|
||||
5. **Ansible 部署** → 配置和部署应用
|
||||
|
||||
### 应用部署流程
|
||||
1. **应用代码更新** → 构建容器镜像
|
||||
2. **镜像推送** → 推送到镜像仓库
|
||||
3. **Nomad Job 更新** → 更新任务定义
|
||||
4. **Nomad 部署** → 滚动更新服务
|
||||
5. **健康检查** → 验证部署状态
|
||||
|
||||
## 📊 监控和可观测性
|
||||
|
||||
### 监控组件
|
||||
- **Prometheus**: 指标收集和存储
|
||||
- **Grafana**: 可视化仪表板
|
||||
- **AlertManager**: 告警管理
|
||||
- **Node Exporter**: 系统指标导出
|
||||
|
||||
### 日志管理
|
||||
- **ELK Stack**: Elasticsearch + Logstash + Kibana
|
||||
- **Fluentd**: 日志收集和转发
|
||||
- **结构化日志**: JSON 格式标准化
|
||||
|
||||
## 🔐 安全最佳实践
|
||||
|
||||
### 基础设施安全
|
||||
- **网络隔离**: VPC, 安全组, 防火墙
|
||||
- **访问控制**: IAM 角色和策略
|
||||
- **数据加密**: 传输和静态加密
|
||||
- **密钥管理**: 云服务商密钥管理服务
|
||||
|
||||
### 应用安全
|
||||
- **容器安全**: 镜像扫描, 最小权限
|
||||
- **网络安全**: 服务网格, TLS 终止
|
||||
- **秘密管理**: Docker Secrets, Ansible Vault
|
||||
- **安全审计**: 日志监控和审计
|
||||
|
||||
## 🧪 测试策略
|
||||
|
||||
### 基础设施测试
|
||||
- **语法检查**: OpenTofu validate
|
||||
- **安全扫描**: Checkov, tfsec
|
||||
- **合规检查**: OPA (Open Policy Agent)
|
||||
|
||||
### 应用测试
|
||||
- **单元测试**: 应用代码测试
|
||||
- **集成测试**: 服务间集成测试
|
||||
- **端到端测试**: 完整流程测试
|
||||
|
||||
### MCP服务器测试
|
||||
项目包含完整的MCP(Model Context Protocol)服务器测试套件,位于 `tests/mcp_servers/` 目录:
|
||||
|
||||
- **context7服务器测试**: 验证初始化、工具列表和搜索功能
|
||||
- **qdrant服务器测试**: 测试文档添加、搜索和删除功能
|
||||
- **qdrant-ollama服务器测试**: 验证向量数据库与LLM集成功能
|
||||
|
||||
测试脚本包括Shell脚本和Python脚本,支持通过JSON-RPC协议直接测试MCP服务器功能。详细的测试结果和问题修复记录请参考 `tests/mcp_server_test_report.md`。
|
||||
|
||||
运行测试:
|
||||
```bash
|
||||
# 运行单个测试脚本
|
||||
cd tests/mcp_servers
|
||||
./test_local_mcp_servers.sh
|
||||
|
||||
# 或运行Python测试
|
||||
python test_mcp_servers_simple.py
|
||||
```
|
||||
|
||||
### Kali Linux系统测试
|
||||
项目包含完整的Kali Linux系统测试套件,位于 `configuration/playbooks/test/` 目录。测试包括:
|
||||
|
||||
1. **快速健康检查** (`kali-health-check.yml`): 基本系统状态检查
|
||||
2. **安全工具测试** (`kali-security-tools.yml`): 测试各种安全工具的安装和功能
|
||||
3. **完整系统测试** (`test-kali.yml`): 全面的系统测试和报告生成
|
||||
4. **完整测试套件** (`kali-full-test-suite.yml`): 按顺序执行所有测试
|
||||
|
||||
运行测试:
|
||||
```bash
|
||||
# Kali Linux快速健康检查
|
||||
make test-kali
|
||||
|
||||
# Kali Linux安全工具测试
|
||||
make test-kali-security
|
||||
|
||||
# Kali Linux完整测试套件
|
||||
make test-kali-full
|
||||
```
|
||||
|
||||
## 📚 文档
|
||||
|
||||
- [Consul集群故障排除](docs/consul-cluster-troubleshooting.md)
|
||||
- [磁盘管理](docs/disk-management.md)
|
||||
- [Nomad NFS设置](docs/nomad-nfs-setup.md)
|
||||
- [Consul-Terraform集成](docs/setup/consul-terraform-integration.md)
|
||||
- [OCI凭据设置](docs/setup/oci-credentials-setup.md)
|
||||
- [Oracle云设置](docs/setup/oracle-cloud-setup.md)
|
||||
|
||||
## 🤝 贡献指南
|
||||
|
||||
1. Fork 项目
|
||||
2. 创建特性分支 (`git checkout -b feature/amazing-feature`)
|
||||
3. 提交变更 (`git commit -m 'Add amazing feature'`)
|
||||
4. 推送到分支 (`git push origin feature/amazing-feature`)
|
||||
5. 创建 Pull Request
|
||||
|
||||
## 📄 许可证
|
||||
|
||||
本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情。
|
||||
|
||||
## 🆘 支持
|
||||
|
||||
如果你遇到问题或有疑问:
|
||||
|
||||
1. 查看 [文档](docs/)
|
||||
2. 搜索 [Issues](../../issues)
|
||||
3. 创建新的 [Issue](../../issues/new)
|
||||
|
||||
## ⚠️ 重要经验教训
|
||||
|
||||
### Terraform 与 Nomad 职责区分
|
||||
**问题**:在基础设施管理中容易混淆 Terraform 和 Nomad 的职责范围,导致架构设计混乱。
|
||||
|
||||
**根本原因**:Terraform 和 Nomad 虽然都是基础设施管理工具,但它们在架构中处于不同层面,负责不同类型的资源管理。
|
||||
|
||||
**解决方案**:
|
||||
1. **明确分层架构**:
|
||||
- **Terraform/OpenTofu**:负责云服务商提供的计算资源(虚拟机)的生命周期管理
|
||||
- **Nomad**:负责在已有虚拟机内部进行应用资源调度和编排
|
||||
|
||||
2. **职责边界清晰**:
|
||||
- Terraform 决定"有哪些虚拟机"
|
||||
- Nomad 决定"虚拟机上运行什么应用"
|
||||
- 两者不应越界管理对方的资源
|
||||
|
||||
3. **工作流程分离**:
|
||||
```
|
||||
1. Terraform 创建虚拟机 (云服务商层面)
|
||||
↓
|
||||
2. 虚拟机启动并运行操作系统
|
||||
↓
|
||||
3. 在虚拟机上安装和配置 Nomad 客户端
|
||||
↓
|
||||
4. Nomad 在虚拟机上调度和运行应用容器
|
||||
```
|
||||
|
||||
**重要提醒**:严格遵守这种分层架构是项目成功的关键。任何混淆这两个层面职责的做法都会导致架构混乱和管理困难。
|
||||
|
||||
### Consul 和 Nomad 访问问题
|
||||
**问题**:尝试访问 Consul 服务时,使用 `http://localhost:8500` 或 `http://127.0.0.1:8500` 无法连接。
|
||||
|
||||
**根本原因**:本项目中的 Consul 和 Nomad 服务通过 Nomad + Podman 在集群中运行,并通过 Tailscale 网络进行访问。这些服务不在本地运行,因此无法通过 localhost 访问。
|
||||
|
||||
**解决方案**:
|
||||
1. **使用 Tailscale IP**:必须使用 Tailscale 分配的 IP 地址访问服务
|
||||
```bash
|
||||
# 查看当前节点的 Tailscale IP
|
||||
tailscale ip -4
|
||||
|
||||
# 查看所有 Tailscale 网络中的节点
|
||||
tailscale status
|
||||
|
||||
# 访问 Consul (使用实际的 Tailscale IP)
|
||||
curl http://100.x.x.x:8500/v1/status/leader
|
||||
|
||||
# 访问 Nomad (使用实际的 Tailscale IP)
|
||||
curl http://100.x.x.x:4646/v1/status/leader
|
||||
```
|
||||
|
||||
2. **服务发现**:Consul 集群由 3 个节点组成,Nomad 集群由十多个节点组成,需要正确识别服务运行的节点
|
||||
|
||||
3. **集群架构**:
|
||||
- Consul 集群:3 个节点 (kr-master, us-ash3c, bj-warden)
|
||||
- Nomad 集群:十多个节点,包括服务器节点和客户端节点
|
||||
|
||||
**重要提醒**:在开发和调试过程中,始终记住使用 Tailscale IP 而不是 localhost 访问集群服务。这是本项目架构的基本要求,必须严格遵守。
|
||||
|
||||
### Consul 集群配置管理经验
|
||||
**问题**:Consul集群配置文件与实际运行状态不一致,导致集群管理混乱和配置错误。
|
||||
|
||||
**根本原因**:Ansible inventory配置文件中的节点信息与实际Consul集群中的节点状态不匹配,包括节点角色、数量和expect值等关键配置。
|
||||
|
||||
**解决方案**:
|
||||
1. **定期验证集群状态**:使用Consul API定期检查集群实际状态,确保配置文件与实际运行状态一致
|
||||
```bash
|
||||
# 查看Consul集群节点信息
|
||||
curl -s http://<consul-server>:8500/v1/catalog/nodes
|
||||
|
||||
# 查看节点详细信息
|
||||
curl -s http://<consul-server>:8500/v1/agent/members
|
||||
|
||||
# 查看集群leader信息
|
||||
curl -s http://<consul-server>:8500/v1/status/leader
|
||||
```
|
||||
|
||||
2. **保持配置文件一致性**:确保所有相关的inventory配置文件(如`csol-consul-nodes.ini`、`consul-nodes.ini`、`consul-cluster.ini`)保持一致,包括:
|
||||
- 服务器节点列表和数量
|
||||
- 客户端节点列表和数量
|
||||
- `bootstrap_expect`值(必须与实际服务器节点数量匹配)
|
||||
- 节点角色和IP地址
|
||||
|
||||
3. **正确识别节点角色**:通过API查询确认每个节点的实际角色,避免将服务器节点误配置为客户端节点,或反之
|
||||
```json
|
||||
// API返回的节点信息示例
|
||||
{
|
||||
"Name": "warden",
|
||||
"Addr": "100.122.197.112",
|
||||
"Port": 8300,
|
||||
"Status": 1,
|
||||
"ProtocolVersion": 2,
|
||||
"Delegate": 1,
|
||||
"Server": true // 确认节点角色
|
||||
}
|
||||
```
|
||||
|
||||
4. **更新配置流程**:当发现配置与实际状态不匹配时,按照以下步骤更新:
|
||||
- 使用API获取集群实际状态
|
||||
- 根据实际状态更新所有相关配置文件
|
||||
- 确保所有配置文件中的信息保持一致
|
||||
- 更新配置文件中的说明和注释,反映最新的集群状态
|
||||
|
||||
**实际案例**:
|
||||
- **初始状态**:配置文件显示2个服务器节点和5个客户端节点,`bootstrap_expect=2`
|
||||
- **实际状态**:Consul集群运行3个服务器节点(master、ash3c、warden),无客户端节点,`expect=3`
|
||||
- **解决方案**:更新所有配置文件,将服务器节点数量改为3个,移除所有客户端节点配置,将`bootstrap_expect`值更新为3
|
||||
|
||||
**重要提醒**:Consul集群配置必须与实际运行状态保持严格一致。任何不匹配都可能导致集群不稳定或功能异常。定期使用Consul API验证集群状态,并及时更新配置文件,是确保集群稳定运行的关键。
|
||||
|
||||
## 🎉 致谢
|
||||
|
||||
感谢所有为这个项目做出贡献的开发者和社区成员!
|
||||
## 脚本整理
|
||||
|
||||
项目脚本已重新整理,按功能分类存放在 `scripts/` 目录中:
|
||||
|
||||
- `scripts/setup/` - 环境设置和初始化
|
||||
- `scripts/deployment/` - 部署相关脚本
|
||||
- `scripts/testing/` - 测试脚本
|
||||
- `scripts/utilities/` - 工具脚本
|
||||
- `scripts/mcp/` - MCP 服务器相关
|
||||
- `scripts/ci-cd/` - CI/CD 相关
|
||||
|
||||
详细信息请查看 [脚本索引](scripts/SCRIPT_INDEX.md)。
|
||||
|
||||
|
|
@ -0,0 +1,104 @@
|
|||
---
|
||||
# Ansible Playbook: 部署 Consul Client 到所有 Nomad 节点
|
||||
- name: Deploy Consul Client to Nomad nodes
|
||||
hosts: nomad_clients:nomad_servers
|
||||
become: yes
|
||||
vars:
|
||||
consul_version: "1.21.5"
|
||||
consul_datacenter: "dc1"
|
||||
consul_servers:
|
||||
- "100.117.106.136:8300" # master (韩国)
|
||||
- "100.122.197.112:8300" # warden (北京)
|
||||
- "100.116.80.94:8300" # ash3c (美国)
|
||||
|
||||
tasks:
|
||||
- name: Update APT cache
|
||||
apt:
|
||||
update_cache: yes
|
||||
|
||||
- name: Install consul via APT (假设源已存在)
|
||||
apt:
|
||||
name: consul={{ consul_version }}-*
|
||||
state: present
|
||||
update_cache: yes
|
||||
register: consul_installed
|
||||
|
||||
- name: Create consul user (if not exists)
|
||||
user:
|
||||
name: consul
|
||||
system: yes
|
||||
shell: /bin/false
|
||||
home: /opt/consul
|
||||
create_home: yes
|
||||
|
||||
- name: Create consul directories
|
||||
file:
|
||||
path: "{{ item }}"
|
||||
state: directory
|
||||
owner: consul
|
||||
group: consul
|
||||
mode: '0755'
|
||||
loop:
|
||||
- /opt/consul
|
||||
- /opt/consul/data
|
||||
- /etc/consul.d
|
||||
- /var/log/consul
|
||||
|
||||
- name: Get node Tailscale IP
|
||||
shell: ip addr show tailscale0 | grep 'inet ' | awk '{print $2}' | cut -d'/' -f1
|
||||
register: tailscale_ip
|
||||
failed_when: tailscale_ip.stdout == ""
|
||||
|
||||
- name: Create consul client configuration
|
||||
template:
|
||||
src: templates/consul-client.hcl.j2
|
||||
dest: /etc/consul.d/consul.hcl
|
||||
owner: consul
|
||||
group: consul
|
||||
mode: '0644'
|
||||
notify: restart consul
|
||||
|
||||
- name: Create consul systemd service
|
||||
template:
|
||||
src: templates/consul.service.j2
|
||||
dest: /etc/systemd/system/consul.service
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
notify: reload systemd
|
||||
|
||||
- name: Enable and start consul service
|
||||
systemd:
|
||||
name: consul
|
||||
enabled: yes
|
||||
state: started
|
||||
notify: restart consul
|
||||
|
||||
- name: Wait for consul to be ready
|
||||
uri:
|
||||
url: "http://{{ tailscale_ip.stdout }}:8500/v1/status/leader"
|
||||
status_code: 200
|
||||
timeout: 5
|
||||
register: consul_leader_status
|
||||
until: consul_leader_status.status == 200
|
||||
retries: 30
|
||||
delay: 5
|
||||
|
||||
- name: Verify consul cluster membership
|
||||
shell: consul members -status=alive -format=json | jq -r '.[].Name'
|
||||
register: consul_members
|
||||
changed_when: false
|
||||
|
||||
- name: Display cluster status
|
||||
debug:
|
||||
msg: "Node {{ inventory_hostname.split('.')[0] }} joined cluster with {{ consul_members.stdout_lines | length }} members"
|
||||
|
||||
handlers:
|
||||
- name: reload systemd
|
||||
systemd:
|
||||
daemon_reload: yes
|
||||
|
||||
- name: restart consul
|
||||
systemd:
|
||||
name: consul
|
||||
state: restarted
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
---
|
||||
# Ansible Inventory for Consul Client Deployment
|
||||
all:
|
||||
children:
|
||||
consul_servers:
|
||||
hosts:
|
||||
master.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.117.106.136
|
||||
region: korea
|
||||
warden.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.122.197.112
|
||||
region: beijing
|
||||
ash3c.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.116.80.94
|
||||
region: usa
|
||||
|
||||
nomad_servers:
|
||||
hosts:
|
||||
# Nomad Server 节点也需要 Consul Client
|
||||
semaphore.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.116.158.95
|
||||
region: korea
|
||||
ch3.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.86.141.112
|
||||
region: switzerland
|
||||
ash1d.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.81.26.3
|
||||
region: usa
|
||||
ash2e.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.103.147.94
|
||||
region: usa
|
||||
ch2.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.90.159.68
|
||||
region: switzerland
|
||||
de.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.120.225.29
|
||||
region: germany
|
||||
onecloud1.tailnet-68f9.ts.net:
|
||||
ansible_host: 100.98.209.50
|
||||
region: unknown
|
||||
|
||||
nomad_clients:
|
||||
hosts:
|
||||
# 需要部署 Consul Client 的节点
|
||||
influxdb1.tailnet-68f9.ts.net:
|
||||
ansible_host: "{{ influxdb1_ip }}" # 需要填入实际IP
|
||||
region: beijing
|
||||
browser.tailnet-68f9.ts.net:
|
||||
ansible_host: "{{ browser_ip }}" # 需要填入实际IP
|
||||
region: beijing
|
||||
# hcp1 已经有 Consul Client,可选择重新配置
|
||||
# hcp1.tailnet-68f9.ts.net:
|
||||
# ansible_host: 100.97.62.111
|
||||
# region: beijing
|
||||
|
||||
vars:
|
||||
ansible_user: root
|
||||
ansible_ssh_private_key_file: ~/.ssh/id_rsa
|
||||
consul_datacenter: dc1
|
||||
|
|
@ -0,0 +1,61 @@
|
|||
# Consul Client Configuration for {{ inventory_hostname }}
|
||||
datacenter = "{{ consul_datacenter }}"
|
||||
data_dir = "/opt/consul/data"
|
||||
log_level = "INFO"
|
||||
node_name = "{{ inventory_hostname.split('.')[0] }}"
|
||||
bind_addr = "{{ tailscale_ip.stdout }}"
|
||||
|
||||
# Client mode (not server)
|
||||
server = false
|
||||
|
||||
# Connect to Consul servers (指向三节点集群)
|
||||
retry_join = [
|
||||
"100.117.106.136", # master (韩国)
|
||||
"100.122.197.112", # warden (北京)
|
||||
"100.116.80.94" # ash3c (美国)
|
||||
]
|
||||
|
||||
# Performance optimization
|
||||
performance {
|
||||
raft_multiplier = 5
|
||||
}
|
||||
|
||||
# Ports configuration
|
||||
ports {
|
||||
grpc = 8502
|
||||
http = 8500
|
||||
dns = 8600
|
||||
}
|
||||
|
||||
# Enable Connect for service mesh
|
||||
connect {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# Cache configuration for performance
|
||||
cache {
|
||||
entry_fetch_max_burst = 42
|
||||
entry_fetch_rate = 30
|
||||
}
|
||||
|
||||
# Node metadata
|
||||
node_meta = {
|
||||
region = "{{ region | default('unknown') }}"
|
||||
zone = "nomad-server"
|
||||
}
|
||||
|
||||
# UI disabled for clients
|
||||
ui_config {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
# ACL configuration (if needed)
|
||||
acl = {
|
||||
enabled = false
|
||||
default_policy = "allow"
|
||||
}
|
||||
|
||||
# Logging
|
||||
log_file = "/var/log/consul/consul.log"
|
||||
log_rotate_duration = "24h"
|
||||
log_rotate_max_files = 7
|
||||
|
|
@ -0,0 +1,26 @@
|
|||
[Unit]
|
||||
Description=Consul Client
|
||||
Documentation=https://www.consul.io/
|
||||
Requires=network-online.target
|
||||
After=network-online.target
|
||||
ConditionFileNotEmpty=/etc/consul.d/consul.hcl
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
User=consul
|
||||
Group=consul
|
||||
ExecStart=/usr/bin/consul agent -config-dir=/etc/consul.d
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
LimitNOFILE=65536
|
||||
|
||||
# Security settings
|
||||
NoNewPrivileges=yes
|
||||
PrivateTmp=yes
|
||||
ProtectHome=yes
|
||||
ProtectSystem=strict
|
||||
ReadWritePaths=/opt/consul /var/log/consul
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
# Consul 配置
|
||||
|
||||
## 部署
|
||||
|
||||
```bash
|
||||
nomad job run components/consul/jobs/consul-cluster.nomad
|
||||
```
|
||||
|
||||
## Job 信息
|
||||
|
||||
- **Job 名称**: `consul-cluster-nomad`
|
||||
- **类型**: service
|
||||
- **节点**: master, ash3c, warden
|
||||
|
||||
## 访问方式
|
||||
|
||||
- Master: `http://master.tailnet-68f9.ts.net:8500`
|
||||
- Ash3c: `http://ash3c.tailnet-68f9.ts.net:8500`
|
||||
- Warden: `http://warden.tailnet-68f9.ts.net:8500`
|
||||
|
|
@ -1,412 +0,0 @@
|
|||
job "consul-cluster-dynamic" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "consul-master" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "kr-master"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
# 使用模板生成配置文件
|
||||
template {
|
||||
data = <<EOF
|
||||
# Consul配置文件 - 动态生成
|
||||
# 此文件由consul-template根据Consul KV存储中的配置动态生成
|
||||
|
||||
# 基础配置
|
||||
data_dir = "/opt/consul/data"
|
||||
raft_dir = "/opt/consul/raft"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "dc1"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = 3
|
||||
|
||||
# 网络配置
|
||||
client_addr = "master"
|
||||
bind_addr = "master"
|
||||
advertise_addr = "master"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = 8600
|
||||
http = 8500
|
||||
https = -1
|
||||
grpc = 8502
|
||||
grpc_tls = 8503
|
||||
serf_lan = 8301
|
||||
serf_wan = 8302
|
||||
server = 8300
|
||||
}
|
||||
|
||||
# 集群连接
|
||||
retry_join = ["ash3c", "warden"]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = true
|
||||
enable_script_checks = true
|
||||
enable_local_script_checks = true
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = 1
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "INFO"
|
||||
enable_syslog = false
|
||||
log_file = "/var/log/consul/consul.log"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "YourEncryptionKeyHere"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "30s"
|
||||
reconnect_timeout_wan = "30s"
|
||||
session_ttl_min = "10s"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = true
|
||||
last_contact_threshold = "200ms"
|
||||
max_trailing_logs = 250
|
||||
server_stabilization_time = "10s"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = false
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = true
|
||||
interval = "24h"
|
||||
retain = 30
|
||||
name = "consul-snapshot-{{.Timestamp}}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = true
|
||||
interval = "6h"
|
||||
retain = 7
|
||||
name = "consul-backup-{{.Timestamp}}"
|
||||
}
|
||||
EOF
|
||||
destination = "local/consul.hcl"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-config-dir=local"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-ash3c" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "us-ash3c"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
# 使用模板生成配置文件
|
||||
template {
|
||||
data = <<EOF
|
||||
# Consul配置文件 - 动态生成
|
||||
# 此文件由consul-template根据Consul KV存储中的配置动态生成
|
||||
|
||||
# 基础配置
|
||||
data_dir = "/opt/consul/data"
|
||||
raft_dir = "/opt/consul/raft"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "dc1"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = 3
|
||||
|
||||
# 网络配置
|
||||
client_addr = "ash3c"
|
||||
bind_addr = "ash3c"
|
||||
advertise_addr = "ash3c"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = 8600
|
||||
http = 8500
|
||||
https = -1
|
||||
grpc = 8502
|
||||
grpc_tls = 8503
|
||||
serf_lan = 8301
|
||||
serf_wan = 8302
|
||||
server = 8300
|
||||
}
|
||||
|
||||
# 集群连接
|
||||
retry_join = ["master", "warden"]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = true
|
||||
enable_script_checks = true
|
||||
enable_local_script_checks = true
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = 1
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "INFO"
|
||||
enable_syslog = false
|
||||
log_file = "/var/log/consul/consul.log"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "YourEncryptionKeyHere"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "30s"
|
||||
reconnect_timeout_wan = "30s"
|
||||
session_ttl_min = "10s"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = true
|
||||
last_contact_threshold = "200ms"
|
||||
max_trailing_logs = 250
|
||||
server_stabilization_time = "10s"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = false
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = true
|
||||
interval = "24h"
|
||||
retain = 30
|
||||
name = "consul-snapshot-{{.Timestamp}}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = true
|
||||
interval = "6h"
|
||||
retain = 7
|
||||
name = "consul-backup-{{.Timestamp}}"
|
||||
}
|
||||
EOF
|
||||
destination = "local/consul.hcl"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-config-dir=local"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-warden" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "bj-warden"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
# 使用模板生成配置文件
|
||||
template {
|
||||
data = <<EOF
|
||||
# Consul配置文件 - 动态生成
|
||||
# 此文件由consul-template根据Consul KV存储中的配置动态生成
|
||||
|
||||
# 基础配置
|
||||
data_dir = "/opt/consul/data"
|
||||
raft_dir = "/opt/consul/raft"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "dc1"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = 3
|
||||
|
||||
# 网络配置
|
||||
client_addr = "warden"
|
||||
bind_addr = "warden"
|
||||
advertise_addr = "warden"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = 8600
|
||||
http = 8500
|
||||
https = -1
|
||||
grpc = 8502
|
||||
grpc_tls = 8503
|
||||
serf_lan = 8301
|
||||
serf_wan = 8302
|
||||
server = 8300
|
||||
}
|
||||
|
||||
# 集群连接
|
||||
retry_join = ["master", "ash3c"]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = true
|
||||
enable_script_checks = true
|
||||
enable_local_script_checks = true
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = 1
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "INFO"
|
||||
enable_syslog = false
|
||||
log_file = "/var/log/consul/consul.log"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "YourEncryptionKeyHere"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "30s"
|
||||
reconnect_timeout_wan = "30s"
|
||||
session_ttl_min = "10s"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = true
|
||||
last_contact_threshold = "200ms"
|
||||
max_trailing_logs = 250
|
||||
server_stabilization_time = "10s"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = false
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = true
|
||||
interval = "24h"
|
||||
retain = 30
|
||||
name = "consul-snapshot-{{.Timestamp}}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = true
|
||||
interval = "6h"
|
||||
retain = 7
|
||||
name = "consul-backup-{{.Timestamp}}"
|
||||
}
|
||||
EOF
|
||||
destination = "local/consul.hcl"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-config-dir=local"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,421 +0,0 @@
|
|||
job "consul-cluster-kv" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "consul-master" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "kr-master"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
# 使用模板从Consul KV获取配置
|
||||
template {
|
||||
data = <<EOF
|
||||
# Consul配置文件 - 从KV存储动态获取
|
||||
# 遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
|
||||
|
||||
# 基础配置
|
||||
data_dir = "{{ keyOrDefault `config/dev/consul/cluster/data_dir` `/opt/consul/data` }}"
|
||||
raft_dir = "{{ keyOrDefault `config/dev/consul/cluster/raft_dir` `/opt/consul/raft` }}"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = {{ keyOrDefault `config/dev/consul/ui/enabled` `true` }}
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "{{ keyOrDefault `config/dev/consul/cluster/datacenter` `dc1` }}"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = {{ keyOrDefault `config/dev/consul/cluster/bootstrap_expect` `3` }}
|
||||
|
||||
# 网络配置
|
||||
client_addr = "{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}"
|
||||
bind_addr = "{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}"
|
||||
advertise_addr = "{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = {{ keyOrDefault `config/dev/consul/ports/dns` `8600` }}
|
||||
http = {{ keyOrDefault `config/dev/consul/ports/http` `8500` }}
|
||||
https = {{ keyOrDefault `config/dev/consul/ports/https` `-1` }}
|
||||
grpc = {{ keyOrDefault `config/dev/consul/ports/grpc` `8502` }}
|
||||
grpc_tls = {{ keyOrDefault `config/dev/consul/ports/grpc_tls` `8503` }}
|
||||
serf_lan = {{ keyOrDefault `config/dev/consul/ports/serf_lan` `8301` }}
|
||||
serf_wan = {{ keyOrDefault `config/dev/consul/ports/serf_wan` `8302` }}
|
||||
server = {{ keyOrDefault `config/dev/consul/ports/server` `8300` }}
|
||||
}
|
||||
|
||||
# 集群连接 - 从KV获取其他节点IP
|
||||
retry_join = [
|
||||
"{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}",
|
||||
"{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
|
||||
]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = {{ keyOrDefault `config/dev/consul/service/enable_service_script` `true` }}
|
||||
enable_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_script_checks` `true` }}
|
||||
enable_local_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_local_script_checks` `true` }}
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = {{ keyOrDefault `config/dev/consul/performance/raft_multiplier` `1` }}
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "{{ keyOrDefault `config/dev/consul/cluster/log_level` `INFO` }}"
|
||||
enable_syslog = {{ keyOrDefault `config/dev/consul/log/enable_syslog` `false` }}
|
||||
log_file = "{{ keyOrDefault `config/dev/consul/log/log_file` `/var/log/consul/consul.log` }}"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "{{ keyOrDefault `config/dev/consul/cluster/encrypt_key` `YourEncryptionKeyHere` }}"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout` `30s` }}"
|
||||
reconnect_timeout_wan = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout_wan` `30s` }}"
|
||||
session_ttl_min = "{{ keyOrDefault `config/dev/consul/connection/session_ttl_min` `10s` }}"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = {{ keyOrDefault `config/dev/consul/autopilot/cleanup_dead_servers` `true` }}
|
||||
last_contact_threshold = "{{ keyOrDefault `config/dev/consul/autopilot/last_contact_threshold` `200ms` }}"
|
||||
max_trailing_logs = {{ keyOrDefault `config/dev/consul/autopilot/max_trailing_logs` `250` }}
|
||||
server_stabilization_time = "{{ keyOrDefault `config/dev/consul/autopilot/server_stabilization_time` `10s` }}
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = {{ keyOrDefault `config/dev/consul/autopilot/disable_upgrade_migration` `false` }}
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = {{ keyOrDefault `config/dev/consul/snapshot/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/dev/consul/snapshot/interval` `24h` }}"
|
||||
retain = {{ keyOrDefault `config/dev/consul/snapshot/retain` `30` }}
|
||||
name = "{{ keyOrDefault `config/dev/consul/snapshot/name` `consul-snapshot-{{.Timestamp}}` }}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = {{ keyOrDefault `config/dev/consul/backup/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/dev/consul/backup/interval` `6h` }}"
|
||||
retain = {{ keyOrDefault `config/dev/consul/backup/retain` `7` }}
|
||||
name = "{{ keyOrDefault `config/dev/consul/backup/name` `consul-backup-{{.Timestamp}}` }}"
|
||||
}
|
||||
EOF
|
||||
destination = "local/consul.hcl"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-config-dir=local"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-ash3c" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "us-ash3c"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
# 使用模板从Consul KV获取配置
|
||||
template {
|
||||
data = <<EOF
|
||||
# Consul配置文件 - 从KV存储动态获取
|
||||
# 遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
|
||||
|
||||
# 基础配置
|
||||
data_dir = "{{ keyOrDefault `config/dev/consul/cluster/data_dir` `/opt/consul/data` }}"
|
||||
raft_dir = "{{ keyOrDefault `config/dev/consul/cluster/raft_dir` `/opt/consul/raft` }}"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = {{ keyOrDefault `config/dev/consul/ui/enabled` `true` }}
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "{{ keyOrDefault `config/dev/consul/cluster/datacenter` `dc1` }}"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = {{ keyOrDefault `config/dev/consul/cluster/bootstrap_expect` `3` }}
|
||||
|
||||
# 网络配置
|
||||
client_addr = "{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}"
|
||||
bind_addr = "{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}"
|
||||
advertise_addr = "{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = {{ keyOrDefault `config/dev/consul/ports/dns` `8600` }}
|
||||
http = {{ keyOrDefault `config/dev/consul/ports/http` `8500` }}
|
||||
https = {{ keyOrDefault `config/dev/consul/ports/https` `-1` }}
|
||||
grpc = {{ keyOrDefault `config/dev/consul/ports/grpc` `8502` }}
|
||||
grpc_tls = {{ keyOrDefault `config/dev/consul/ports/grpc_tls` `8503` }}
|
||||
serf_lan = {{ keyOrDefault `config/dev/consul/ports/serf_lan` `8301` }}
|
||||
serf_wan = {{ keyOrDefault `config/dev/consul/ports/serf_wan` `8302` }}
|
||||
server = {{ keyOrDefault `config/dev/consul/ports/server` `8300` }}
|
||||
}
|
||||
|
||||
# 集群连接 - 从KV获取其他节点IP
|
||||
retry_join = [
|
||||
"{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}",
|
||||
"{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
|
||||
]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = {{ keyOrDefault `config/dev/consul/service/enable_service_script` `true` }}
|
||||
enable_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_script_checks` `true` }}
|
||||
enable_local_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_local_script_checks` `true` }}
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = {{ keyOrDefault `config/dev/consul/performance/raft_multiplier` `1` }}
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "{{ keyOrDefault `config/dev/consul/cluster/log_level` `INFO` }}"
|
||||
enable_syslog = {{ keyOrDefault `config/dev/consul/log/enable_syslog` `false` }}
|
||||
log_file = "{{ keyOrDefault `config/dev/consul/log/log_file` `/var/log/consul/consul.log` }}"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "{{ keyOrDefault `config/dev/consul/cluster/encrypt_key` `YourEncryptionKeyHere` }}"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout` `30s` }}"
|
||||
reconnect_timeout_wan = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout_wan` `30s` }}"
|
||||
session_ttl_min = "{{ keyOrDefault `config/dev/consul/connection/session_ttl_min` `10s` }}"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = {{ keyOrDefault `config/dev/consul/autopilot/cleanup_dead_servers` `true` }}
|
||||
last_contact_threshold = "{{ keyOrDefault `config/dev/consul/autopilot/last_contact_threshold` `200ms` }}"
|
||||
max_trailing_logs = {{ keyOrDefault `config/dev/consul/autopilot/max_trailing_logs` `250` }}
|
||||
server_stabilization_time = "{{ keyOrDefault `config/dev/consul/autopilot/server_stabilization_time` `10s` }}"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = {{ keyOrDefault `config/dev/consul/autopilot/disable_upgrade_migration` `false` }}
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = {{ keyOrDefault `config/dev/consul/snapshot/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/dev/consul/snapshot/interval` `24h` }}"
|
||||
retain = {{ keyOrDefault `config/dev/consul/snapshot/retain` `30` }}
|
||||
name = "{{ keyOrDefault `config/dev/consul/snapshot/name` `consul-snapshot-{{.Timestamp}}` }}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = {{ keyOrDefault `config/dev/consul/backup/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/dev/consul/backup/interval` `6h` }}"
|
||||
retain = {{ keyOrDefault `config/dev/consul/backup/retain` `7` }}
|
||||
name = "{{ keyOrDefault `config/dev/consul/backup/name` `consul-backup-{{.Timestamp}}` }}"
|
||||
}
|
||||
EOF
|
||||
destination = "local/consul.hcl"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-config-dir=local"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-warden" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "bj-warden"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
# 使用模板从Consul KV获取配置
|
||||
template {
|
||||
data = <<EOF
|
||||
# Consul配置文件 - 从KV存储动态获取
|
||||
# 遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
|
||||
|
||||
# 基础配置
|
||||
data_dir = "{{ keyOrDefault `config/dev/consul/cluster/data_dir` `/opt/consul/data` }}"
|
||||
raft_dir = "{{ keyOrDefault `config/dev/consul/cluster/raft_dir` `/opt/consul/raft` }}"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = {{ keyOrDefault `config/dev/consul/ui/enabled` `true` }}
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "{{ keyOrDefault `config/dev/consul/cluster/datacenter` `dc1` }}"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = {{ keyOrDefault `config/dev/consul/cluster/bootstrap_expect` `3` }}
|
||||
|
||||
# 网络配置
|
||||
client_addr = "{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
|
||||
bind_addr = "{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
|
||||
advertise_addr = "{{ keyOrDefault `config/dev/consul/nodes/warden/hostname` `warden` }}"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = {{ keyOrDefault `config/dev/consul/ports/dns` `8600` }}
|
||||
http = {{ keyOrDefault `config/dev/consul/ports/http` `8500` }}
|
||||
https = {{ keyOrDefault `config/dev/consul/ports/https` `-1` }}
|
||||
grpc = {{ keyOrDefault `config/dev/consul/ports/grpc` `8502` }}
|
||||
grpc_tls = {{ keyOrDefault `config/dev/consul/ports/grpc_tls` `8503` }}
|
||||
serf_lan = {{ keyOrDefault `config/dev/consul/ports/serf_lan` `8301` }}
|
||||
serf_wan = {{ keyOrDefault `config/dev/consul/ports/serf_wan` `8302` }}
|
||||
server = {{ keyOrDefault `config/dev/consul/ports/server` `8300` }}
|
||||
}
|
||||
|
||||
# 集群连接 - 从KV获取其他节点IP
|
||||
retry_join = [
|
||||
"{{ keyOrDefault `config/dev/consul/nodes/master/hostname` `master` }}",
|
||||
"{{ keyOrDefault `config/dev/consul/nodes/ash3c/hostname` `ash3c` }}"
|
||||
]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = {{ keyOrDefault `config/dev/consul/service/enable_service_script` `true` }}
|
||||
enable_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_script_checks` `true` }}
|
||||
enable_local_script_checks = {{ keyOrDefault `config/dev/consul/service/enable_local_script_checks` `true` }}
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = {{ keyOrDefault `config/dev/consul/performance/raft_multiplier` `1` }}
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "{{ keyOrDefault `config/dev/consul/cluster/log_level` `INFO` }}"
|
||||
enable_syslog = {{ keyOrDefault `config/dev/consul/log/enable_syslog` `false` }}
|
||||
log_file = "{{ keyOrDefault `config/dev/consul/log/log_file` `/var/log/consul/consul.log` }}"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "{{ keyOrDefault `config/dev/consul/cluster/encrypt_key` `YourEncryptionKeyHere` }}"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout` `30s` }}"
|
||||
reconnect_timeout_wan = "{{ keyOrDefault `config/dev/consul/connection/reconnect_timeout_wan` `30s` }}"
|
||||
session_ttl_min = "{{ keyOrDefault `config/dev/consul/connection/session_ttl_min` `10s` }}"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = {{ keyOrDefault `config/dev/consul/autopilot/cleanup_dead_servers` `true` }}
|
||||
last_contact_threshold = "{{ keyOrDefault `config/dev/consul/autopilot/last_contact_threshold` `200ms` }}"
|
||||
max_trailing_logs = {{ keyOrDefault `config/dev/consul/autopilot/max_trailing_logs` `250` }}
|
||||
server_stabilization_time = "{{ keyOrDefault `config/dev/consul/autopilot/server_stabilization_time` `10s` }}
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = {{ keyOrDefault `config/dev/consul/autopilot/disable_upgrade_migration` `false` }}
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = {{ keyOrDefault `config/dev/consul/snapshot/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/dev/consul/snapshot/interval` `24h` }}"
|
||||
retain = {{ keyOrDefault `config/dev/consul/snapshot/retain` `30` }}
|
||||
name = "{{ keyOrDefault `config/dev/consul/snapshot/name` `consul-snapshot-{{.Timestamp}}` }}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = {{ keyOrDefault `config/dev/consul/backup/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/dev/consul/backup/interval` `6h` }}"
|
||||
retain = {{ keyOrDefault `config/dev/consul/backup/retain` `7` }}
|
||||
name = "{{ keyOrDefault `config/dev/consul/backup/name` `consul-backup-{{.Timestamp}}` }}"
|
||||
}
|
||||
EOF
|
||||
destination = "local/consul.hcl"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-config-dir=local"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,225 +0,0 @@
|
|||
job "consul-cluster-simple" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "consul-master" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "kr-master"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=0.0.0.0",
|
||||
"-advertise=100.117.106.136",
|
||||
"-retry-join=100.116.80.94",
|
||||
"-retry-join=100.122.197.112",
|
||||
"-ui",
|
||||
"-http-port=${NOMAD_PORT_http}",
|
||||
"-server-port=${NOMAD_PORT_rpc}",
|
||||
"-serf-lan-port=${NOMAD_PORT_serf_lan}",
|
||||
"-serf-wan-port=${NOMAD_PORT_serf_wan}"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-ash3c" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "us-ash3c"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=0.0.0.0",
|
||||
"-advertise=100.116.80.94",
|
||||
"-retry-join=100.117.106.136",
|
||||
"-retry-join=100.122.197.112",
|
||||
"-ui",
|
||||
"-http-port=${NOMAD_PORT_http}",
|
||||
"-server-port=${NOMAD_PORT_rpc}",
|
||||
"-serf-lan-port=${NOMAD_PORT_serf_lan}",
|
||||
"-serf-wan-port=${NOMAD_PORT_serf_wan}"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-warden" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "bj-warden"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=100.122.197.112",
|
||||
"-advertise=100.122.197.112",
|
||||
"-retry-join=100.117.106.136",
|
||||
"-retry-join=100.116.80.94",
|
||||
"-ui",
|
||||
"-http-port=${NOMAD_PORT_http}",
|
||||
"-server-port=${NOMAD_PORT_rpc}",
|
||||
"-serf-lan-port=${NOMAD_PORT_serf_lan}",
|
||||
"-serf-wan-port=${NOMAD_PORT_serf_wan}"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-semaphore" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "semaphore"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "rpc" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf_lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf_wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=100.116.158.95",
|
||||
"-advertise=100.116.158.95",
|
||||
"-retry-join=100.117.106.136",
|
||||
"-retry-join=100.116.80.94",
|
||||
"-retry-join=100.122.197.112",
|
||||
"-ui",
|
||||
"-http-port=${NOMAD_PORT_http}",
|
||||
"-server-port=${NOMAD_PORT_rpc}",
|
||||
"-serf-lan-port=${NOMAD_PORT_serf_lan}",
|
||||
"-serf-wan-port=${NOMAD_PORT_serf_wan}"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,57 +1,115 @@
|
|||
job "consul-cluster" {
|
||||
job "consul-cluster-nomad" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "consul-servers" {
|
||||
count = 3
|
||||
|
||||
group "consul-master" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "(master|ash3c|hcp)"
|
||||
value = "master"
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "podman"
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
image = "hashicorp/consul:latest"
|
||||
ports = ["server", "serf_lan", "serf_wan", "ui"]
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/consul/data",
|
||||
"-ui",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind={{ env `NOMAD_IP_server` }}",
|
||||
"-retry-join=100.117.106.136",
|
||||
"-bind=100.117.106.136",
|
||||
"-advertise=100.117.106.136",
|
||||
"-retry-join=100.116.80.94",
|
||||
"-retry-join=100.76.13.187"
|
||||
"-retry-join=100.122.197.112",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
volume = "consul-data"
|
||||
destination = "/consul/data"
|
||||
read_only = false
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-ash3c" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "ash3c"
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=100.116.80.94",
|
||||
"-advertise=100.116.80.94",
|
||||
"-retry-join=100.117.106.136",
|
||||
"-retry-join=100.122.197.112",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "server" { static = 8300 }
|
||||
port "serf_lan" { static = 8301 }
|
||||
port "serf_wan" { static = 8302 }
|
||||
port "ui" { static = 8500 }
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
volume "consul-data" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "consul-data"
|
||||
group "consul-warden" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=100.122.197.112",
|
||||
"-advertise=100.122.197.112",
|
||||
"-retry-join=100.117.106.136",
|
||||
"-retry-join=100.116.80.94",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
# Nomad 配置
|
||||
|
||||
## Jobs
|
||||
|
||||
- `install-podman-driver.nomad` - 安装 Podman 驱动
|
||||
- `nomad-consul-config.nomad` - Nomad-Consul 配置
|
||||
- `nomad-consul-setup.nomad` - Nomad-Consul 设置
|
||||
- `nomad-nfs-volume.nomad` - NFS 卷配置
|
||||
|
|
@ -0,0 +1,55 @@
|
|||
job "nomad-consul-config" {
|
||||
datacenters = ["dc1"]
|
||||
type = "system"
|
||||
|
||||
group "nomad-server-config" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "semaphore|ash1d|ash2e|ch2|ch3|onecloud1|de"
|
||||
}
|
||||
|
||||
task "update-nomad-config" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "sh"
|
||||
args = [
|
||||
"-c",
|
||||
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "nomad-client-config" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "master|ash3c|browser|influxdb1|hcp1|warden"
|
||||
}
|
||||
|
||||
task "update-nomad-config" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "sh"
|
||||
args = [
|
||||
"-c",
|
||||
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,23 @@
|
|||
job "nomad-consul-setup" {
|
||||
datacenters = ["dc1"]
|
||||
type = "system"
|
||||
|
||||
group "nomad-config" {
|
||||
task "setup-consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "sh"
|
||||
args = [
|
||||
"-c",
|
||||
"if grep -q 'server.*enabled.*true' /etc/nomad.d/nomad.hcl; then sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl; else sed -i '/^consul {/,/^}/c\\consul {\\n address = \"master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl; fi && systemctl restart nomad"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,28 @@
|
|||
# Traefik 配置
|
||||
|
||||
## 部署
|
||||
|
||||
```bash
|
||||
nomad job run components/traefik/jobs/traefik.nomad
|
||||
```
|
||||
|
||||
## 配置特点
|
||||
|
||||
- 明确绑定 Tailscale IP (100.97.62.111)
|
||||
- 地理位置优化的 Consul 集群顺序(北京 → 韩国 → 美国)
|
||||
- 适合跨太平洋网络的宽松健康检查
|
||||
- 无服务健康检查,避免 flapping
|
||||
|
||||
## 访问方式
|
||||
|
||||
- Dashboard: `http://hcp1.tailnet-68f9.ts.net:8080/dashboard/`
|
||||
- 直接 IP: `http://100.97.62.111:8080/dashboard/`
|
||||
- Consul LB: `http://hcp1.tailnet-68f9.ts.net:80`
|
||||
|
||||
## 故障排除
|
||||
|
||||
如果遇到服务 flapping 问题:
|
||||
1. 检查是否使用了 RFC1918 私有地址
|
||||
2. 确认 Tailscale 网络连通性
|
||||
3. 调整健康检查间隔时间
|
||||
4. 考虑地理位置对网络延迟的影响
|
||||
|
|
@ -0,0 +1,97 @@
|
|||
job "traefik-consul-lb" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
update {
|
||||
min_healthy_time = "5s"
|
||||
healthy_deadline = "10m"
|
||||
progress_deadline = "15m"
|
||||
auto_revert = false
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "100.97.62.111:80"
|
||||
traefik:
|
||||
address: "100.97.62.111:8080"
|
||||
|
||||
providers:
|
||||
file:
|
||||
filename: /local/dynamic.yml
|
||||
watch: true
|
||||
|
||||
log:
|
||||
level: INFO
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
http:
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
|
||||
- url: "http://master.tailnet-68f9.ts.net:8500" # 备用
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 备用
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-api:
|
||||
rule: "PathPrefix(`/`)"
|
||||
service: consul-cluster
|
||||
entryPoints:
|
||||
- web
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# Vault 配置
|
||||
|
||||
## Jobs
|
||||
|
||||
- `vault-cluster-exec.nomad` - Vault 集群 (exec 驱动)
|
||||
- `vault-cluster-podman.nomad` - Vault 集群 (podman 驱动)
|
||||
- `vault-dev-warden.nomad` - Vault 开发环境
|
||||
|
|
@ -39,8 +39,14 @@ job "vault-cluster-exec" {
|
|||
|
||||
template {
|
||||
data = <<EOH
|
||||
storage "file" {
|
||||
path = "/opt/nomad/data/vault/data"
|
||||
storage "consul" {
|
||||
address = "{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
|
||||
path = "vault/"
|
||||
# Consul服务发现配置
|
||||
service {
|
||||
name = "vault"
|
||||
tags = ["vault"]
|
||||
}
|
||||
}
|
||||
|
||||
listener "tcp" {
|
||||
|
|
@ -58,20 +64,12 @@ disable_mlock = true
|
|||
disable_sealwrap = true
|
||||
disable_cache = false
|
||||
|
||||
# 配置consul连接
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
path = "vault/"
|
||||
# 注意:可能需要配置token
|
||||
# token = "your-consul-token"
|
||||
}
|
||||
# 启用原始日志记录
|
||||
enable_raw_log = true
|
||||
|
||||
# 配置consul连接
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
path = "vault/"
|
||||
# 注意:可能需要配置token
|
||||
# token = "your-consul-token"
|
||||
# 集成Nomad服务发现
|
||||
service_registration {
|
||||
enabled = true
|
||||
}
|
||||
EOH
|
||||
destination = "/opt/nomad/data/vault/config/vault.hcl"
|
||||
|
|
@ -100,14 +98,7 @@ EOH
|
|||
group "vault-ash3c" {
|
||||
count = 1
|
||||
|
||||
# 显式指定consul版本要求,覆盖自动约束
|
||||
constraint {
|
||||
attribute = "${attr.consul.version}"
|
||||
operator = "version"
|
||||
value = ">= 1.0.0"
|
||||
}
|
||||
|
||||
# 添加一个总是满足的约束来确保调度
|
||||
# 移除对consul版本的约束,使用driver约束替代
|
||||
constraint {
|
||||
attribute = "${driver.exec}"
|
||||
operator = "="
|
||||
|
|
@ -141,8 +132,14 @@ EOH
|
|||
|
||||
template {
|
||||
data = <<EOH
|
||||
storage "file" {
|
||||
path = "/opt/nomad/data/vault/data"
|
||||
storage "consul" {
|
||||
address = "{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
|
||||
path = "vault/"
|
||||
# Consul服务发现配置
|
||||
service {
|
||||
name = "vault"
|
||||
tags = ["vault"]
|
||||
}
|
||||
}
|
||||
|
||||
listener "tcp" {
|
||||
|
|
@ -159,6 +156,14 @@ disable_mlock = true
|
|||
# 添加更多配置来解决权限问题
|
||||
disable_sealwrap = true
|
||||
disable_cache = false
|
||||
|
||||
# 启用原始日志记录
|
||||
enable_raw_log = true
|
||||
|
||||
# 集成Nomad服务发现
|
||||
service_registration {
|
||||
enabled = true
|
||||
}
|
||||
EOH
|
||||
destination = "/opt/nomad/data/vault/config/vault.hcl"
|
||||
}
|
||||
|
|
@ -186,14 +191,7 @@ EOH
|
|||
group "vault-warden" {
|
||||
count = 1
|
||||
|
||||
# 显式指定consul版本要求,覆盖自动约束
|
||||
constraint {
|
||||
attribute = "${attr.consul.version}"
|
||||
operator = "version"
|
||||
value = ">= 1.0.0"
|
||||
}
|
||||
|
||||
# 添加一个总是满足的约束来确保调度
|
||||
# 移除对consul版本的约束,使用driver约束替代
|
||||
constraint {
|
||||
attribute = "${driver.exec}"
|
||||
operator = "="
|
||||
|
|
@ -227,8 +225,14 @@ EOH
|
|||
|
||||
template {
|
||||
data = <<EOH
|
||||
storage "file" {
|
||||
path = "/opt/nomad/data/vault/data"
|
||||
storage "consul" {
|
||||
address = "{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
|
||||
path = "vault/"
|
||||
# Consul服务发现配置
|
||||
service {
|
||||
name = "vault"
|
||||
tags = ["vault"]
|
||||
}
|
||||
}
|
||||
|
||||
listener "tcp" {
|
||||
|
|
@ -245,6 +249,14 @@ disable_mlock = true
|
|||
# 添加更多配置来解决权限问题
|
||||
disable_sealwrap = true
|
||||
disable_cache = false
|
||||
|
||||
# 启用原始日志记录
|
||||
enable_raw_log = true
|
||||
|
||||
# 集成Nomad服务发现
|
||||
service_registration {
|
||||
enabled = true
|
||||
}
|
||||
EOH
|
||||
destination = "/opt/nomad/data/vault/config/vault.hcl"
|
||||
}
|
||||
|
|
|
|||
|
|
@ -1,21 +1,23 @@
|
|||
[nomad_servers]
|
||||
# 服务器节点 (7个服务器节点)
|
||||
#本机,不操作bj-semaphore ansible_host=100.116.158.95 ansible_user=root ansible_password=3131 ansible_become_password=3131
|
||||
ash1d ansible_host=100.81.26.3 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ash2e ansible_host=100.103.147.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch2 ansible_host=100.90.159.68 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch3 ansible_host=100.86.141.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
onecloud1 ansible_host=100.98.209.50 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
de ansible_host=100.120.225.29 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
# ⚠️ 警告:能力越大,责任越大!服务器节点操作需极其谨慎!
|
||||
# ⚠️ 任何对服务器节点的操作都可能影响整个集群的稳定性!
|
||||
semaphore ansible_host=semaphore.tailnet-68f9.ts.net ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
ash1d ansible_host=ash1d.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ash2e ansible_host=ash2e.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch2 ansible_host=ch2.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch3 ansible_host=ch3.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
onecloud1 ansible_host=onecloud1.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
de ansible_host=de.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[nomad_clients]
|
||||
# 客户端节点
|
||||
master ansible_host=100.117.106.136 ansible_user=ben ansible_password=3131 ansible_become_password=3131 ansible_port=60022
|
||||
ash3c ansible_host=100.116.80.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
browser ansible_host=100.116.112.45 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
influxdb1 ansible_host=100.116.80.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
hcp1 ansible_host=100.97.62.111 ansible_user=root ansible_password=3131 ansible_become_password=3131
|
||||
warden ansible_host=100.122.197.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
master ansible_host=master.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131 ansible_port=60022
|
||||
ash3c ansible_host=ash3c.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
browser ansible_host=browser.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
influxdb1 ansible_host=influxdb1.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
hcp1 ansible_host=hcp1.tailnet-68f9.ts.net ansible_user=root ansible_password=3131 ansible_become_password=3131
|
||||
warden ansible_host=warden.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[nomad_nodes:children]
|
||||
nomad_servers
|
||||
|
|
|
|||
|
|
@ -4,17 +4,6 @@
|
|||
become: yes
|
||||
vars:
|
||||
nomad_config_dir: /etc/nomad.d
|
||||
client_ip: "{{ ansible_host }}"
|
||||
|
||||
# Nomad节点名称(带地理位置前缀)
|
||||
client_name: >-
|
||||
{%- if inventory_hostname == 'influxdb1' -%}us-influxdb
|
||||
{%- elif inventory_hostname == 'master' -%}kr-master
|
||||
{%- elif inventory_hostname == 'hcp1' -%}bj-hcp1
|
||||
{%- elif inventory_hostname == 'hcp2' -%}bj-hcp2
|
||||
{%- elif inventory_hostname == 'warden' -%}bj-warden
|
||||
{%- else -%}{{ inventory_hostname }}
|
||||
{%- endif -%}
|
||||
|
||||
tasks:
|
||||
- name: 创建Nomad配置目录
|
||||
|
|
|
|||
|
|
@ -1,104 +0,0 @@
|
|||
---
|
||||
- name: 配置Nomad客户端节点
|
||||
hosts: target_nodes
|
||||
become: yes
|
||||
vars:
|
||||
nomad_config_dir: /etc/nomad.d
|
||||
|
||||
tasks:
|
||||
- name: 创建Nomad配置目录
|
||||
file:
|
||||
path: "{{ nomad_config_dir }}"
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 复制Nomad客户端配置
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
bind_addr = "0.0.0.0"
|
||||
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
# 配置七姐妹服务器地址
|
||||
servers = [
|
||||
"100.116.158.95:4647", # bj-semaphore
|
||||
"100.81.26.3:4647", # ash1d
|
||||
"100.103.147.94:4647", # ash2e
|
||||
"100.90.159.68:4647", # ch2
|
||||
"100.86.141.112:4647", # ch3
|
||||
"100.98.209.50:4647", # bj-onecloud1
|
||||
"100.120.225.29:4647" # de
|
||||
]
|
||||
host_volume "fnsync" {
|
||||
path = "/mnt/fnsync"
|
||||
read_only = false
|
||||
}
|
||||
# 禁用Docker驱动,只使用Podman
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
# 配置Podman插件目录
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
|
||||
addresses {
|
||||
http = "{{ ansible_host }}"
|
||||
rpc = "{{ ansible_host }}"
|
||||
serf = "{{ ansible_host }}"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "{{ ansible_host }}:4646"
|
||||
rpc = "{{ ansible_host }}:4647"
|
||||
serf = "{{ ansible_host }}:4648"
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.116.158.95:8500"
|
||||
}
|
||||
|
||||
# 配置Podman驱动
|
||||
plugin "podman" {
|
||||
config {
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
logging {
|
||||
type = "journald"
|
||||
}
|
||||
gc {
|
||||
container = true
|
||||
}
|
||||
}
|
||||
}
|
||||
dest: "{{ nomad_config_dir }}/nomad.hcl"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
|
||||
- name: 启动Nomad服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
enabled: yes
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 检查Nomad服务状态
|
||||
command: systemctl status nomad
|
||||
register: nomad_status
|
||||
changed_when: false
|
||||
|
||||
- name: 显示Nomad服务状态
|
||||
debug:
|
||||
var: nomad_status.stdout_lines
|
||||
|
|
@ -1,104 +0,0 @@
|
|||
---
|
||||
- name: 配置Nomad客户端节点
|
||||
hosts: target_nodes
|
||||
become: yes
|
||||
vars:
|
||||
nomad_config_dir: /etc/nomad.d
|
||||
|
||||
tasks:
|
||||
- name: 创建Nomad配置目录
|
||||
file:
|
||||
path: "{{ nomad_config_dir }}"
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 复制Nomad客户端配置
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
bind_addr = "0.0.0.0"
|
||||
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
# 配置七姐妹服务器地址
|
||||
servers = [
|
||||
"100.116.158.95:4647", # bj-semaphore
|
||||
"100.81.26.3:4647", # ash1d
|
||||
"100.103.147.94:4647", # ash2e
|
||||
"100.90.159.68:4647", # ch2
|
||||
"100.86.141.112:4647", # ch3
|
||||
"100.98.209.50:4647", # bj-onecloud1
|
||||
"100.120.225.29:4647" # de
|
||||
]
|
||||
host_volume "fnsync" {
|
||||
path = "/mnt/fnsync"
|
||||
read_only = false
|
||||
}
|
||||
# 禁用Docker驱动,只使用Podman
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
# 配置Podman插件目录
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
|
||||
addresses {
|
||||
http = "{{ ansible_host }}"
|
||||
rpc = "{{ ansible_host }}"
|
||||
serf = "{{ ansible_host }}"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "{{ ansible_host }}:4646"
|
||||
rpc = "{{ ansible_host }}:4647"
|
||||
serf = "{{ ansible_host }}:4648"
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.116.158.95:8500"
|
||||
}
|
||||
|
||||
# 配置Podman驱动
|
||||
plugin "podman" {
|
||||
config {
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
logging {
|
||||
type = "journald"
|
||||
}
|
||||
gc {
|
||||
container = true
|
||||
}
|
||||
}
|
||||
}
|
||||
dest: "{{ nomad_config_dir }}/nomad.hcl"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
|
||||
- name: 启动Nomad服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
enabled: yes
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 检查Nomad服务状态
|
||||
command: systemctl status nomad
|
||||
register: nomad_status
|
||||
changed_when: false
|
||||
|
||||
- name: 显示Nomad服务状态
|
||||
debug:
|
||||
var: nomad_status.stdout_lines
|
||||
|
|
@ -0,0 +1,44 @@
|
|||
---
|
||||
- name: 统一配置所有Nomad节点
|
||||
hosts: nomad_nodes
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: 备份当前Nomad配置
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak
|
||||
remote_src: yes
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 生成统一Nomad配置
|
||||
template:
|
||||
src: ../templates/nomad-unified.hcl.j2
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
|
||||
- name: 重启Nomad服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
enabled: yes
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 等待Nomad服务就绪
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
delay: 10
|
||||
timeout: 60
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 检查Nomad服务状态
|
||||
command: systemctl status nomad
|
||||
register: nomad_status
|
||||
changed_when: false
|
||||
|
||||
- name: 显示Nomad服务状态
|
||||
debug:
|
||||
var: nomad_status.stdout_lines
|
||||
|
|
@ -1,105 +0,0 @@
|
|||
---
|
||||
- name: 部署韩国节点Nomad配置
|
||||
hosts: ch2,ch3
|
||||
become: yes
|
||||
gather_facts: no
|
||||
vars:
|
||||
nomad_config_dir: "/etc/nomad.d"
|
||||
nomad_config_file: "{{ nomad_config_dir }}/nomad.hcl"
|
||||
source_config_dir: "/root/mgmt/infrastructure/configs/server"
|
||||
|
||||
tasks:
|
||||
- name: 获取主机名短名称(去掉.global后缀)
|
||||
set_fact:
|
||||
short_hostname: "{{ inventory_hostname | regex_replace('\\.global$', '') }}"
|
||||
|
||||
- name: 确保 Nomad 配置目录存在
|
||||
file:
|
||||
path: "{{ nomad_config_dir }}"
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 部署 Nomad 配置文件到韩国节点
|
||||
copy:
|
||||
src: "{{ source_config_dir }}/nomad-{{ short_hostname }}.hcl"
|
||||
dest: "{{ nomad_config_file }}"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
backup: yes
|
||||
notify: restart nomad
|
||||
|
||||
- name: 检查 Nomad 二进制文件位置
|
||||
shell: which nomad || find /usr -name nomad 2>/dev/null | head -1
|
||||
register: nomad_binary_path
|
||||
failed_when: nomad_binary_path.stdout == ""
|
||||
|
||||
- name: 创建/更新 Nomad systemd 服务文件
|
||||
copy:
|
||||
dest: "/etc/systemd/system/nomad.service"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
content: |
|
||||
[Unit]
|
||||
Description=Nomad
|
||||
Documentation=https://www.nomadproject.io/
|
||||
Requires=network-online.target
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
User=root
|
||||
Group=root
|
||||
ExecStart={{ nomad_binary_path.stdout }} agent -config=/etc/nomad.d/nomad.hcl
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
LimitNOFILE=65536
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
notify: restart nomad
|
||||
|
||||
- name: 确保 Nomad 数据目录存在
|
||||
file:
|
||||
path: "/opt/nomad/data"
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 重新加载 systemd daemon
|
||||
systemd:
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 启用并启动 Nomad 服务
|
||||
systemd:
|
||||
name: nomad
|
||||
enabled: yes
|
||||
state: started
|
||||
|
||||
- name: 等待 Nomad 服务启动
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ ansible_host }}"
|
||||
delay: 5
|
||||
timeout: 30
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 显示 Nomad 服务状态
|
||||
command: systemctl status nomad
|
||||
register: nomad_status
|
||||
changed_when: false
|
||||
|
||||
- name: 显示 Nomad 服务状态信息
|
||||
debug:
|
||||
var: nomad_status.stdout_lines
|
||||
|
||||
handlers:
|
||||
- name: restart nomad
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -1,105 +0,0 @@
|
|||
---
|
||||
- name: 部署韩国节点Nomad配置
|
||||
hosts: ch2,ch3
|
||||
become: yes
|
||||
gather_facts: no
|
||||
vars:
|
||||
nomad_config_dir: "/etc/nomad.d"
|
||||
nomad_config_file: "{{ nomad_config_dir }}/nomad.hcl"
|
||||
source_config_dir: "/root/mgmt/infrastructure/configs/server"
|
||||
|
||||
tasks:
|
||||
- name: 获取主机名短名称(去掉后缀)
|
||||
set_fact:
|
||||
short_hostname: "{{ inventory_hostname | regex_replace('\\$', '') }}"
|
||||
|
||||
- name: 确保 Nomad 配置目录存在
|
||||
file:
|
||||
path: "{{ nomad_config_dir }}"
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 部署 Nomad 配置文件到韩国节点
|
||||
copy:
|
||||
src: "{{ source_config_dir }}/nomad-{{ short_hostname }}.hcl"
|
||||
dest: "{{ nomad_config_file }}"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
backup: yes
|
||||
notify: restart nomad
|
||||
|
||||
- name: 检查 Nomad 二进制文件位置
|
||||
shell: which nomad || find /usr -name nomad 2>/dev/null | head -1
|
||||
register: nomad_binary_path
|
||||
failed_when: nomad_binary_path.stdout == ""
|
||||
|
||||
- name: 创建/更新 Nomad systemd 服务文件
|
||||
copy:
|
||||
dest: "/etc/systemd/system/nomad.service"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
content: |
|
||||
[Unit]
|
||||
Description=Nomad
|
||||
Documentation=https://www.nomadproject.io/
|
||||
Requires=network-online.target
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
User=root
|
||||
Group=root
|
||||
ExecStart={{ nomad_binary_path.stdout }} agent -config=/etc/nomad.d/nomad.hcl
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
LimitNOFILE=65536
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
notify: restart nomad
|
||||
|
||||
- name: 确保 Nomad 数据目录存在
|
||||
file:
|
||||
path: "/opt/nomad/data"
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 重新加载 systemd daemon
|
||||
systemd:
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 启用并启动 Nomad 服务
|
||||
systemd:
|
||||
name: nomad
|
||||
enabled: yes
|
||||
state: started
|
||||
|
||||
- name: 等待 Nomad 服务启动
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ ansible_host }}"
|
||||
delay: 5
|
||||
timeout: 30
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 显示 Nomad 服务状态
|
||||
command: systemctl status nomad
|
||||
register: nomad_status
|
||||
changed_when: false
|
||||
|
||||
- name: 显示 Nomad 服务状态信息
|
||||
debug:
|
||||
var: nomad_status.stdout_lines
|
||||
|
||||
handlers:
|
||||
- name: restart nomad
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -0,0 +1,73 @@
|
|||
---
|
||||
- name: 修正Nomad节点的Consul角色配置
|
||||
hosts: nomad_nodes
|
||||
become: yes
|
||||
vars:
|
||||
consul_addresses: "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
|
||||
|
||||
tasks:
|
||||
- name: 备份原始Nomad配置
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak_{{ ansible_date_time.iso8601 }}
|
||||
remote_src: yes
|
||||
|
||||
- name: 检查节点角色
|
||||
shell: grep -A 1 "server {" /etc/nomad.d/nomad.hcl | grep "enabled = true" | wc -l
|
||||
register: is_server
|
||||
changed_when: false
|
||||
|
||||
- name: 检查节点角色
|
||||
shell: grep -A 1 "client {" /etc/nomad.d/nomad.hcl | grep "enabled = true" | wc -l
|
||||
register: is_client
|
||||
changed_when: false
|
||||
|
||||
- name: 修正服务器节点的Consul配置
|
||||
blockinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
marker: "# {mark} ANSIBLE MANAGED BLOCK - CONSUL CONFIG"
|
||||
block: |
|
||||
consul {
|
||||
address = "{{ consul_addresses }}"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = false
|
||||
}
|
||||
replace: true
|
||||
when: is_server.stdout == "1"
|
||||
|
||||
- name: 修正客户端节点的Consul配置
|
||||
blockinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
marker: "# {mark} ANSIBLE MANAGED BLOCK - CONSUL CONFIG"
|
||||
block: |
|
||||
consul {
|
||||
address = "{{ consul_addresses }}"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = false
|
||||
client_auto_join = true
|
||||
}
|
||||
replace: true
|
||||
when: is_client.stdout == "1"
|
||||
|
||||
- name: 重启Nomad服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
enabled: yes
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 等待Nomad服务启动
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ ansible_host }}"
|
||||
timeout: 30
|
||||
|
||||
- name: 显示节点角色和配置
|
||||
debug:
|
||||
msg: "节点 {{ inventory_hostname }} 是 {{ '服务器' if is_server.stdout == '1' else '客户端' }} 节点,Consul配置已更新"
|
||||
|
||||
|
|
@ -0,0 +1,43 @@
|
|||
---
|
||||
- name: 更新所有Nomad节点的Consul配置
|
||||
hosts: nomad_nodes
|
||||
become: yes
|
||||
vars:
|
||||
consul_addresses: "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
|
||||
|
||||
tasks:
|
||||
- name: 备份原始Nomad配置
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.backup.{{ ansible_date_time.epoch }}
|
||||
remote_src: yes
|
||||
backup: yes
|
||||
|
||||
- name: 更新Nomad Consul配置
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^\s*address\s*=\s*".*"'
|
||||
line: ' address = "{{ consul_addresses }}"'
|
||||
state: present
|
||||
|
||||
- name: 重启Nomad服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
enabled: yes
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 等待Nomad服务启动
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ ansible_host }}"
|
||||
timeout: 30
|
||||
|
||||
- name: 检查Nomad服务状态
|
||||
systemd:
|
||||
name: nomad
|
||||
register: nomad_status
|
||||
|
||||
- name: 显示Nomad服务状态
|
||||
debug:
|
||||
msg: "节点 {{ inventory_hostname }} Nomad服务状态: {{ nomad_status.status.ActiveState }}"
|
||||
|
|
@ -0,0 +1,26 @@
|
|||
---
|
||||
- name: 紧急回滚 - 恢复直连Consul配置
|
||||
hosts: nomad_nodes
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: 🚨 紧急回滚Consul配置
|
||||
replace:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: 'address = "hcp1.tailnet-68f9.ts.net:80"'
|
||||
replace: 'address = "100.117.106.136:8500"'
|
||||
notify: restart nomad
|
||||
|
||||
- name: ✅ 验证回滚配置
|
||||
shell: grep "address.*=" /etc/nomad.d/nomad.hcl
|
||||
register: rollback_config
|
||||
|
||||
- name: 📋 显示回滚后配置
|
||||
debug:
|
||||
msg: "回滚后配置: {{ rollback_config.stdout }}"
|
||||
|
||||
handlers:
|
||||
- name: restart nomad
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -2,20 +2,20 @@ datacenter = "dc1"
|
|||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "{{ client_name }}"
|
||||
name = "{{ inventory_hostname }}"
|
||||
|
||||
bind_addr = "{{ client_ip }}"
|
||||
bind_addr = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "{{ client_ip }}"
|
||||
rpc = "{{ client_ip }}"
|
||||
serf = "{{ client_ip }}"
|
||||
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "{{ client_ip }}:4646"
|
||||
rpc = "{{ client_ip }}:4647"
|
||||
serf = "{{ client_ip }}:4648"
|
||||
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4646"
|
||||
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4647"
|
||||
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
|
|
@ -30,15 +30,17 @@ server {
|
|||
|
||||
client {
|
||||
enabled = true
|
||||
# 配置七仙女服务器地址,使用短名
|
||||
network_interface = "tailscale0"
|
||||
|
||||
# 配置七仙女服务器地址,使用完整FQDN
|
||||
servers = [
|
||||
"semaphore:4647", # bj-semaphore
|
||||
"ash1d:4647", # ash1d
|
||||
"ash2e:4647", # ash2e
|
||||
"ch2:4647", # ch2
|
||||
"ch3:4647", # ch3
|
||||
"onecloud1:4647", # bj-onecloud1
|
||||
"de:4647" # de
|
||||
"semaphore.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ash2e.tailnet-68f9.ts.net:4647",
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"onecloud1.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
|
||||
# 配置host volumes
|
||||
|
|
@ -52,6 +54,18 @@ client {
|
|||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
|
||||
# 配置节点元数据
|
||||
meta {
|
||||
consul = "true"
|
||||
consul_version = "1.21.5"
|
||||
consul_server = {% if inventory_hostname in ['master', 'ash3c', 'warden'] %}"true"{% else %}"false"{% endif %}
|
||||
}
|
||||
|
||||
# 激进的垃圾清理策略
|
||||
gc_interval = "5m"
|
||||
gc_disk_usage_threshold = 80
|
||||
gc_inode_usage_threshold = 70
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
|
|
@ -64,13 +78,26 @@ plugin "nomad-driver-podman" {
|
|||
}
|
||||
|
||||
consul {
|
||||
address = "master:8500,ash3c:8500,warden:8500"
|
||||
address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://master:8200,http://ash3c:8200,http://warden:8200"
|
||||
address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
|
|
@ -4,12 +4,18 @@ plugin_dir = "/opt/nomad/plugins"
|
|||
log_level = "INFO"
|
||||
name = "{{ server_name }}"
|
||||
|
||||
bind_addr = "{{ server_ip }}"
|
||||
bind_addr = "{{ server_name }}.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "{{ server_ip }}"
|
||||
rpc = "{{ server_ip }}"
|
||||
serf = "{{ server_ip }}"
|
||||
http = "{{ server_name }}.tailnet-68f9.ts.net"
|
||||
rpc = "{{ server_name }}.tailnet-68f9.ts.net"
|
||||
serf = "{{ server_name }}.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "{{ server_name }}.tailnet-68f9.ts.net:4646"
|
||||
rpc = "{{ server_name }}.tailnet-68f9.ts.net:4647"
|
||||
serf = "{{ server_name }}.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
|
|
@ -20,8 +26,14 @@ ports {
|
|||
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
retry_join = ["semaphore", "ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"]
|
||||
bootstrap_expect = 7
|
||||
retry_join = [
|
||||
{%- for server in groups['nomad_servers'] -%}
|
||||
{%- if server != inventory_hostname -%}
|
||||
"{{ server }}.tailnet-68f9.ts.net"{% if not loop.last %},{% endif %}
|
||||
{%- endif -%}
|
||||
{%- endfor -%}
|
||||
]
|
||||
}
|
||||
|
||||
client {
|
||||
|
|
@ -38,12 +50,17 @@ plugin "nomad-driver-podman" {
|
|||
}
|
||||
|
||||
consul {
|
||||
address = "master:8500,ash3c:8500,warden:8500"
|
||||
address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://master:8200,http://ash3c:8200,http://warden:8200"
|
||||
address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
|
|
|
|||
|
|
@ -0,0 +1,81 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "{{ inventory_hostname }}"
|
||||
|
||||
bind_addr = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4646"
|
||||
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4647"
|
||||
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = {{ 'true' if inventory_hostname in groups['nomad_servers'] else 'false' }}
|
||||
{% if inventory_hostname in groups['nomad_servers'] %}
|
||||
bootstrap_expect = 3
|
||||
retry_join = [
|
||||
"semaphore.tailnet-68f9.ts.net",
|
||||
"ash1d.tailnet-68f9.ts.net",
|
||||
"ash2e.tailnet-68f9.ts.net",
|
||||
"ch2.tailnet-68f9.ts.net",
|
||||
"ch3.tailnet-68f9.ts.net",
|
||||
"onecloud1.tailnet-68f9.ts.net",
|
||||
"de.tailnet-68f9.ts.net"
|
||||
]
|
||||
{% endif %}
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
|
||||
meta {
|
||||
consul = "true"
|
||||
consul_version = "1.21.5"
|
||||
}
|
||||
|
||||
# 激进的垃圾清理策略
|
||||
gc_interval = "5m"
|
||||
gc_disk_usage_threshold = 80
|
||||
gc_inode_usage_threshold = 70
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -0,0 +1,45 @@
|
|||
---
|
||||
- name: 实现路由反射器架构 - 所有节点通过Traefik访问Consul
|
||||
hosts: nomad_nodes
|
||||
become: yes
|
||||
vars:
|
||||
traefik_endpoint: "hcp1.tailnet-68f9.ts.net:80"
|
||||
|
||||
tasks:
|
||||
- name: 📊 显示架构优化信息
|
||||
debug:
|
||||
msg: |
|
||||
🎯 实现BGP路由反射器模式
|
||||
📉 连接数优化:Full Mesh (54连接) → Star Topology (21连接)
|
||||
🌐 所有节点 → Traefik → Consul Leader
|
||||
run_once: true
|
||||
|
||||
- name: 🔍 检查当前Consul配置
|
||||
shell: grep "address.*=" /etc/nomad.d/nomad.hcl
|
||||
register: current_config
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 📋 显示当前配置
|
||||
debug:
|
||||
msg: "当前配置: {{ current_config.stdout }}"
|
||||
|
||||
- name: 🔧 更新Consul地址为Traefik端点
|
||||
replace:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: 'address = "[^"]*"'
|
||||
replace: 'address = "{{ traefik_endpoint }}"'
|
||||
notify: restart nomad
|
||||
|
||||
- name: ✅ 验证配置更新
|
||||
shell: grep "address.*=" /etc/nomad.d/nomad.hcl
|
||||
register: new_config
|
||||
|
||||
- name: 📋 显示新配置
|
||||
debug:
|
||||
msg: "新配置: {{ new_config.stdout }}"
|
||||
|
||||
handlers:
|
||||
- name: restart nomad
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -1,69 +0,0 @@
|
|||
---
|
||||
- name: Update Nomad configuration for ch2 server
|
||||
hosts: ch2
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Backup original nomad.hcl
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak
|
||||
remote_src: yes
|
||||
|
||||
- name: Update nomad.hcl with retry_join configuration
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ch2"
|
||||
|
||||
bind_addr = "100.90.159.68"
|
||||
|
||||
addresses {
|
||||
http = "100.90.159.68"
|
||||
rpc = "100.90.159.68"
|
||||
serf = "100.90.159.68"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -1,69 +0,0 @@
|
|||
---
|
||||
- name: Update Nomad configuration for ch2 server with correct name
|
||||
hosts: ch2
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Backup original nomad.hcl
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak2
|
||||
remote_src: yes
|
||||
|
||||
- name: Update nomad.hcl with correct name and retry_join configuration
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ch2"
|
||||
|
||||
bind_addr = "100.90.159.68"
|
||||
|
||||
addresses {
|
||||
http = "100.90.159.68"
|
||||
rpc = "100.90.159.68"
|
||||
serf = "100.90.159.68"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -1,69 +0,0 @@
|
|||
---
|
||||
- name: Update Nomad configuration for ch2 server with correct name
|
||||
hosts: ch2
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Backup original nomad.hcl
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak2
|
||||
remote_src: yes
|
||||
|
||||
- name: Update nomad.hcl with correct name and retry_join configuration
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ch2"
|
||||
|
||||
bind_addr = "100.90.159.68"
|
||||
|
||||
addresses {
|
||||
http = "100.90.159.68"
|
||||
rpc = "100.90.159.68"
|
||||
serf = "100.90.159.68"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -1,69 +0,0 @@
|
|||
---
|
||||
- name: Update Nomad configuration for ch2 server with correct name
|
||||
hosts: ch2
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Backup original nomad.hcl
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak2
|
||||
remote_src: yes
|
||||
|
||||
- name: Update nomad.hcl with correct name and retry_join configuration
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ch2"
|
||||
|
||||
bind_addr = "100.90.159.68"
|
||||
|
||||
addresses {
|
||||
http = "100.90.159.68"
|
||||
rpc = "100.90.159.68"
|
||||
serf = "100.90.159.68"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -1,69 +0,0 @@
|
|||
---
|
||||
- name: Update Nomad configuration for ch2 server with correct name format
|
||||
hosts: ch2
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Backup original nomad.hcl
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak3
|
||||
remote_src: yes
|
||||
|
||||
- name: Update nomad.hcl with correct name format and retry_join configuration
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ch2"
|
||||
|
||||
bind_addr = "100.90.159.68"
|
||||
|
||||
addresses {
|
||||
http = "100.90.159.68"
|
||||
rpc = "100.90.159.68"
|
||||
serf = "100.90.159.68"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -1,69 +0,0 @@
|
|||
---
|
||||
- name: Update Nomad configuration for ch2 server with correct name format
|
||||
hosts: ch2
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Backup original nomad.hcl
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak3
|
||||
remote_src: yes
|
||||
|
||||
- name: Update nomad.hcl with correct name format and retry_join configuration
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ch2"
|
||||
|
||||
bind_addr = "100.90.159.68"
|
||||
|
||||
addresses {
|
||||
http = "100.90.159.68"
|
||||
rpc = "100.90.159.68"
|
||||
serf = "100.90.159.68"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -1,69 +0,0 @@
|
|||
---
|
||||
- name: Update Nomad configuration for ch2 server with correct name format
|
||||
hosts: ch2
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Backup original nomad.hcl
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak3
|
||||
remote_src: yes
|
||||
|
||||
- name: Update nomad.hcl with correct name format and retry_join configuration
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ch2"
|
||||
|
||||
bind_addr = "100.90.159.68"
|
||||
|
||||
addresses {
|
||||
http = "100.90.159.68"
|
||||
rpc = "100.90.159.68"
|
||||
serf = "100.90.159.68"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
|
@ -0,0 +1,144 @@
|
|||
# Consul 集群架构设计
|
||||
|
||||
## 当前架构
|
||||
|
||||
### Consul Servers (3个)
|
||||
- **master** (100.117.106.136) - 韩国,当前 Leader
|
||||
- **warden** (100.122.197.112) - 北京,Voter
|
||||
- **ash3c** (100.116.80.94) - 美国,Voter
|
||||
|
||||
### Consul Clients (1个+)
|
||||
- **hcp1** (100.97.62.111) - 北京,系统级 Client
|
||||
|
||||
## 架构优势
|
||||
|
||||
### ✅ 当前设计的优点:
|
||||
1. **高可用** - 3个 Server 可容忍 1个故障
|
||||
2. **地理分布** - 跨三个地区,容灾能力强
|
||||
3. **性能优化** - 每个地区有本地 Server
|
||||
4. **扩展性** - Client 可按需添加
|
||||
|
||||
### ✅ 为什么 hcp1 作为 Client 是正确的:
|
||||
1. **服务就近注册** - Traefik 运行在 hcp1,本地 Client 效率最高
|
||||
2. **减少网络延迟** - 避免跨网络的服务注册
|
||||
3. **健康检查优化** - 本地 Client 可以更准确地检查服务状态
|
||||
4. **故障隔离** - hcp1 Client 故障不影响集群共识
|
||||
|
||||
## 扩展建议
|
||||
|
||||
### 🎯 理想的 Client 部署:
|
||||
```
|
||||
每个运行业务服务的节点都应该有 Consul Client:
|
||||
|
||||
┌─────────────┬─────────────┬─────────────┐
|
||||
│ Server │ Client │ 业务服务 │
|
||||
├─────────────┼─────────────┼─────────────┤
|
||||
│ master │ ✓ (内置) │ Consul │
|
||||
│ warden │ ✓ (内置) │ Consul │
|
||||
│ ash3c │ ✓ (内置) │ Consul │
|
||||
│ hcp1 │ ✓ (独立) │ Traefik │
|
||||
│ 其他节点... │ 建议添加 │ 其他服务... │
|
||||
└─────────────┴─────────────┴─────────────┘
|
||||
```
|
||||
|
||||
### 🔧 Client 配置标准:
|
||||
```bash
|
||||
# hcp1 的 Consul Client 配置 (/etc/consul.d/consul.hcl)
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/consul"
|
||||
log_level = "INFO"
|
||||
node_name = "hcp1"
|
||||
bind_addr = "100.97.62.111"
|
||||
|
||||
# 连接到所有 Server
|
||||
retry_join = [
|
||||
"100.117.106.136", # master
|
||||
"100.122.197.112", # warden
|
||||
"100.116.80.94" # ash3c
|
||||
]
|
||||
|
||||
# Client 模式
|
||||
server = false
|
||||
ui_config {
|
||||
enabled = false # Client 不需要 UI
|
||||
}
|
||||
|
||||
# 服务发现和健康检查
|
||||
ports {
|
||||
grpc = 8502
|
||||
http = 8500
|
||||
}
|
||||
|
||||
connect {
|
||||
enabled = true
|
||||
}
|
||||
```
|
||||
|
||||
## 服务注册策略
|
||||
|
||||
### 🎯 推荐方案:
|
||||
1. **Nomad 自动注册** (首选)
|
||||
- 通过 Nomad 的 `consul` 配置
|
||||
- 自动处理服务生命周期
|
||||
- 与部署流程集成
|
||||
|
||||
2. **本地 Client 注册** (当前方案)
|
||||
- 通过本地 Consul Client
|
||||
- 手动管理,但更灵活
|
||||
- 适合复杂的注册逻辑
|
||||
|
||||
3. **Catalog API 注册** (应急方案)
|
||||
- 直接通过 Consul API
|
||||
- 绕过同步问题
|
||||
- 用于故障恢复
|
||||
|
||||
### 🔄 迁移到 Nomad 注册:
|
||||
```hcl
|
||||
# 在 Nomad Client 配置中
|
||||
consul {
|
||||
address = "127.0.0.1:8500" # 本地 Consul Client
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = false
|
||||
client_auto_join = true
|
||||
}
|
||||
```
|
||||
|
||||
## 监控和维护
|
||||
|
||||
### 📊 关键指标:
|
||||
- **Raft Index 同步** - 确保所有 Server 数据一致
|
||||
- **Client 连接状态** - 监控 Client 与 Server 的连接
|
||||
- **服务注册延迟** - 跟踪注册到可发现的时间
|
||||
- **健康检查状态** - 监控服务健康状态
|
||||
|
||||
### 🛠️ 维护脚本:
|
||||
```bash
|
||||
# 集群健康检查
|
||||
./scripts/consul-cluster-health.sh
|
||||
|
||||
# 服务同步验证
|
||||
./scripts/verify-service-sync.sh
|
||||
|
||||
# 故障恢复
|
||||
./scripts/consul-recovery.sh
|
||||
```
|
||||
|
||||
## 故障处理
|
||||
|
||||
### 🚨 常见问题:
|
||||
1. **Server 故障** - 自动 failover,无需干预
|
||||
2. **Client 断连** - 重启 Client,自动重连
|
||||
3. **服务同步问题** - 使用 Catalog API 强制同步
|
||||
4. **网络分区** - Raft 算法自动处理
|
||||
|
||||
### 🔧 恢复步骤:
|
||||
1. 检查集群状态
|
||||
2. 验证网络连通性
|
||||
3. 重启有问题的组件
|
||||
4. 强制重新注册服务
|
||||
|
||||
---
|
||||
|
||||
**结论**: 当前架构设计合理,hcp1 作为 Client 是正确的选择。建议保持现有架构,并考虑为其他业务节点添加 Consul Client。
|
||||
|
|
@ -0,0 +1,188 @@
|
|||
# Consul 架构优化方案
|
||||
|
||||
## 当前痛点分析
|
||||
|
||||
### 网络延迟现状:
|
||||
- **北京内部**: ~0.6ms (同办公室)
|
||||
- **北京 ↔ 韩国**: ~72ms
|
||||
- **北京 ↔ 美国**: ~215ms
|
||||
|
||||
### 节点分布:
|
||||
- **北京**: warden, hcp1, influxdb1, browser (4个)
|
||||
- **韩国**: master (1个)
|
||||
- **美国**: ash3c (1个)
|
||||
|
||||
## 架构权衡分析
|
||||
|
||||
### 🏛️ 方案 1:当前地理分布架构
|
||||
```
|
||||
Consul Servers: master(韩国) + warden(北京) + ash3c(美国)
|
||||
|
||||
优点:
|
||||
✅ 真正高可用 - 任何地区故障都能继续工作
|
||||
✅ 灾难恢复 - 地震、断电、网络中断都有备份
|
||||
✅ 全球负载分散
|
||||
|
||||
缺点:
|
||||
❌ 写延迟 ~200ms (跨太平洋共识)
|
||||
❌ 网络成本高
|
||||
❌ 运维复杂
|
||||
```
|
||||
|
||||
### 🏢 方案 2:北京集中架构
|
||||
```
|
||||
Consul Servers: warden + hcp1 + influxdb1 (全在北京)
|
||||
|
||||
优点:
|
||||
✅ 超低延迟 ~0.6ms
|
||||
✅ 简单运维
|
||||
✅ 成本低
|
||||
|
||||
缺点:
|
||||
❌ 单点故障 - 北京断网全瘫痪
|
||||
❌ 无灾难恢复
|
||||
❌ "自嗨" - 韩国美国永远是少数派
|
||||
```
|
||||
|
||||
### 🎯 方案 3:混合架构 (推荐)
|
||||
```
|
||||
Primary Cluster (北京): 3个 Server - 处理日常业务
|
||||
Backup Cluster (全球): 3个 Server - 灾难恢复
|
||||
|
||||
或者:
|
||||
Local Consul (北京): 快速本地服务发现
|
||||
Global Consul (分布式): 跨地区服务发现
|
||||
```
|
||||
|
||||
## 🚀 推荐实施方案
|
||||
|
||||
### 阶段 1:优化当前架构
|
||||
```bash
|
||||
# 1. 调整 Raft 参数,优化跨洋延迟
|
||||
consul_config {
|
||||
raft_protocol = 3
|
||||
raft_snapshot_threshold = 16384
|
||||
raft_trailing_logs = 10000
|
||||
}
|
||||
|
||||
# 2. 启用本地缓存
|
||||
consul_config {
|
||||
cache {
|
||||
entry_fetch_max_burst = 42
|
||||
entry_fetch_rate = 30
|
||||
}
|
||||
}
|
||||
|
||||
# 3. 优化网络
|
||||
consul_config {
|
||||
performance {
|
||||
raft_multiplier = 5 # 增加容忍度
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 阶段 2:部署本地 Consul Clients
|
||||
```bash
|
||||
# 在所有北京节点部署 Consul Client
|
||||
nodes = ["hcp1", "influxdb1", "browser"]
|
||||
|
||||
for node in nodes:
|
||||
deploy_consul_client(node, {
|
||||
"servers": ["warden:8300"], # 优先本地
|
||||
"retry_join": [
|
||||
"warden.tailnet-68f9.ts.net:8300",
|
||||
"master.tailnet-68f9.ts.net:8300",
|
||||
"ash3c.tailnet-68f9.ts.net:8300"
|
||||
]
|
||||
})
|
||||
```
|
||||
|
||||
### 阶段 3:智能路由
|
||||
```bash
|
||||
# 配置基于地理位置的智能路由
|
||||
consul_config {
|
||||
# 北京节点优先连接 warden
|
||||
# 韩国节点优先连接 master
|
||||
# 美国节点优先连接 ash3c
|
||||
|
||||
connect {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# 本地优先策略
|
||||
node_meta {
|
||||
region = "beijing"
|
||||
zone = "office-1"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## 🎯 最终建议
|
||||
|
||||
### 对于你的场景:
|
||||
|
||||
**保持当前的 3 节点地理分布,但优化性能:**
|
||||
|
||||
1. **接受延迟现实** - 200ms 对大多数应用可接受
|
||||
2. **优化本地访问** - 部署更多 Consul Client
|
||||
3. **智能缓存** - 本地缓存热点数据
|
||||
4. **读写分离** - 读操作走本地,写操作走 Raft
|
||||
|
||||
### 具体优化:
|
||||
|
||||
```bash
|
||||
# 1. 为北京 4 个节点都部署 Consul Client
|
||||
./scripts/deploy-consul-clients.sh beijing
|
||||
|
||||
# 2. 配置本地优先策略
|
||||
consul_config {
|
||||
datacenter = "dc1"
|
||||
node_meta = {
|
||||
region = "beijing"
|
||||
}
|
||||
|
||||
# 本地读取优化
|
||||
ui_config {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# 缓存配置
|
||||
cache {
|
||||
entry_fetch_max_burst = 42
|
||||
}
|
||||
}
|
||||
|
||||
# 3. 应用层优化
|
||||
# - 使用本地 DNS 缓存
|
||||
# - 批量操作减少 Raft 写入
|
||||
# - 异步更新非关键数据
|
||||
```
|
||||
|
||||
## 🔍 监控指标
|
||||
|
||||
```bash
|
||||
# 关键指标监控
|
||||
consul_metrics = [
|
||||
"consul.raft.commitTime", # Raft 提交延迟
|
||||
"consul.raft.leader.lastContact", # Leader 联系延迟
|
||||
"consul.dns.stale_queries", # DNS 过期查询
|
||||
"consul.catalog.register_time" # 服务注册时间
|
||||
]
|
||||
```
|
||||
|
||||
## 💡 结论
|
||||
|
||||
**你的分析完全正确!**
|
||||
|
||||
- ✅ **地理分布确实有延迟成本**
|
||||
- ✅ **北京集中确实是"自嗨"**
|
||||
- ✅ **这是分布式系统的根本权衡**
|
||||
|
||||
**最佳策略:保持当前架构,通过优化减轻延迟影响**
|
||||
|
||||
因为:
|
||||
1. **200ms 延迟对大多数业务可接受**
|
||||
2. **真正的高可用比延迟更重要**
|
||||
3. **可以通过缓存和优化大幅改善体验**
|
||||
|
||||
你的技术判断很准确!这确实是一个没有完美答案的权衡问题。
|
||||
|
|
@ -0,0 +1,170 @@
|
|||
# Consul 服务注册解决方案
|
||||
|
||||
## 问题背景
|
||||
|
||||
在跨太平洋的 Nomad + Consul 集群中,遇到以下问题:
|
||||
1. **RFC1918 地址问题** - Nomad 自动注册使用私有 IP,跨网络无法访问
|
||||
2. **Consul Leader 轮换** - 服务只注册到单个节点,leader 变更时服务丢失
|
||||
3. **服务 Flapping** - 健康检查失败导致服务频繁注册/注销
|
||||
|
||||
## 解决方案
|
||||
|
||||
### 1. 多节点冗余注册
|
||||
|
||||
**核心思路:向所有 Consul 节点同时注册服务,避免 leader 轮换影响**
|
||||
|
||||
#### Consul 集群节点:
|
||||
- `master.tailnet-68f9.ts.net:8500` (韩国,通常是 leader)
|
||||
- `warden.tailnet-68f9.ts.net:8500` (北京,优先节点)
|
||||
- `ash3c.tailnet-68f9.ts.net:8500` (美国,备用节点)
|
||||
|
||||
#### 注册脚本:`scripts/register-traefik-to-all-consul.sh`
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# 向所有三个 Consul 节点注册 Traefik 服务
|
||||
|
||||
CONSUL_NODES=(
|
||||
"master.tailnet-68f9.ts.net:8500"
|
||||
"warden.tailnet-68f9.ts.net:8500"
|
||||
"ash3c.tailnet-68f9.ts.net:8500"
|
||||
)
|
||||
|
||||
TRAEFIK_IP="100.97.62.111" # Tailscale IP,非 RFC1918
|
||||
ALLOC_ID=$(nomad job allocs traefik-consul-lb | head -2 | tail -1 | awk '{print $1}')
|
||||
|
||||
# 注册到所有节点...
|
||||
```
|
||||
|
||||
### 2. 使用 Tailscale 地址
|
||||
|
||||
**关键配置:**
|
||||
- 服务地址:`100.97.62.111` (Tailscale IP)
|
||||
- 避免 RFC1918 私有地址 (`192.168.x.x`)
|
||||
- 跨网络可访问
|
||||
|
||||
### 3. 宽松健康检查
|
||||
|
||||
**跨太平洋网络优化:**
|
||||
- Interval: `30s` (而非默认 10s)
|
||||
- Timeout: `15s` (而非默认 5s)
|
||||
- 避免网络延迟导致的误报
|
||||
|
||||
## 持久化方案
|
||||
|
||||
### 方案 A:Nomad Job 集成 (推荐)
|
||||
|
||||
在 Traefik job 中添加 lifecycle hooks:
|
||||
|
||||
```hcl
|
||||
task "consul-registrar" {
|
||||
driver = "exec"
|
||||
|
||||
lifecycle {
|
||||
hook = "poststart"
|
||||
sidecar = false
|
||||
}
|
||||
|
||||
config {
|
||||
command = "/local/register-services.sh"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 方案 B:定时任务
|
||||
|
||||
```bash
|
||||
# 添加到 crontab
|
||||
*/5 * * * * /root/mgmt/scripts/register-traefik-to-all-consul.sh
|
||||
```
|
||||
|
||||
### 方案 C:Consul Template 监控
|
||||
|
||||
使用 consul-template 监控 Traefik 状态并自动注册。
|
||||
|
||||
## 部署步骤
|
||||
|
||||
1. **部署简化版 Traefik**:
|
||||
```bash
|
||||
nomad job run components/traefik/jobs/traefik.nomad
|
||||
```
|
||||
|
||||
2. **执行多节点注册**:
|
||||
```bash
|
||||
./scripts/register-traefik-to-all-consul.sh
|
||||
```
|
||||
|
||||
3. **验证注册状态**:
|
||||
```bash
|
||||
# 检查所有节点
|
||||
for node in master warden ash3c; do
|
||||
echo "=== $node ==="
|
||||
curl -s http://$node.tailnet-68f9.ts.net:8500/v1/catalog/services | jq 'keys[]' | grep -E "(consul-lb|traefik)"
|
||||
done
|
||||
```
|
||||
|
||||
## 故障排除
|
||||
|
||||
### 问题:北京 warden 节点服务缺失
|
||||
|
||||
**可能原因:**
|
||||
1. Consul 集群同步延迟
|
||||
2. 网络分区或连接问题
|
||||
3. 健康检查失败
|
||||
|
||||
**排查命令:**
|
||||
```bash
|
||||
# 检查 Consul 集群状态
|
||||
curl -s http://warden.tailnet-68f9.ts.net:8500/v1/status/peers
|
||||
|
||||
# 检查本地服务
|
||||
curl -s http://warden.tailnet-68f9.ts.net:8500/v1/agent/services
|
||||
|
||||
# 检查健康检查
|
||||
curl -s http://warden.tailnet-68f9.ts.net:8500/v1/agent/checks
|
||||
```
|
||||
|
||||
**解决方法:**
|
||||
```bash
|
||||
# 强制重新注册到 warden
|
||||
curl -X PUT http://warden.tailnet-68f9.ts.net:8500/v1/agent/service/register -d '{
|
||||
"ID": "traefik-consul-lb-manual",
|
||||
"Name": "consul-lb",
|
||||
"Address": "100.97.62.111",
|
||||
"Port": 80,
|
||||
"Tags": ["consul", "loadbalancer", "traefik", "manual"]
|
||||
}'
|
||||
```
|
||||
|
||||
## 监控和维护
|
||||
|
||||
### 健康检查监控
|
||||
```bash
|
||||
# 检查所有节点的服务健康状态
|
||||
./scripts/check-consul-health.sh
|
||||
```
|
||||
|
||||
### 定期验证
|
||||
```bash
|
||||
# 每日验证脚本
|
||||
./scripts/daily-consul-verification.sh
|
||||
```
|
||||
|
||||
## 最佳实践
|
||||
|
||||
1. **地理优化** - 优先使用地理位置最近的 Consul 节点
|
||||
2. **冗余注册** - 始终注册到所有节点,避免单点故障
|
||||
3. **使用 Tailscale** - 避免 RFC1918 地址,确保跨网络访问
|
||||
4. **宽松检查** - 跨洋网络使用宽松的健康检查参数
|
||||
5. **文档记录** - 所有配置变更都要有文档记录
|
||||
|
||||
## 访问方式
|
||||
|
||||
- **Consul UI**: `https://hcp1.tailnet-68f9.ts.net/`
|
||||
- **Traefik Dashboard**: `https://hcp1.tailnet-68f9.ts.net:8080/`
|
||||
|
||||
---
|
||||
|
||||
**创建时间**: 2025-10-02
|
||||
**最后更新**: 2025-10-02
|
||||
**维护者**: Infrastructure Team
|
||||
|
|
@ -1,99 +0,0 @@
|
|||
job "waypoint-server" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "waypoint" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
network {
|
||||
port "ui" {
|
||||
static = 9701
|
||||
}
|
||||
|
||||
port "api" {
|
||||
static = 9702
|
||||
}
|
||||
|
||||
port "grpc" {
|
||||
static = 9703
|
||||
}
|
||||
}
|
||||
|
||||
task "server" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "hashicorp/waypoint:latest"
|
||||
ports = ["ui", "api", "grpc"]
|
||||
|
||||
args = [
|
||||
"server",
|
||||
"run",
|
||||
"-accept-tos",
|
||||
"-vvv",
|
||||
"-platform=nomad",
|
||||
"-nomad-host=${attr.nomad.advertise.address}",
|
||||
"-nomad-consul-service=true",
|
||||
"-nomad-consul-service-hostname=${attr.unique.hostname}",
|
||||
"-nomad-consul-datacenter=dc1",
|
||||
"-listen-grpc=0.0.0.0:9703",
|
||||
"-listen-http=0.0.0.0:9702",
|
||||
"-url-api=http://${attr.unique.hostname}:9702",
|
||||
"-url-ui=http://${attr.unique.hostname}:9701"
|
||||
]
|
||||
}
|
||||
|
||||
env {
|
||||
WAYPOINT_SERVER_DISABLE_MEMORY_DB = "true"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
service {
|
||||
name = "waypoint-ui"
|
||||
port = "ui"
|
||||
|
||||
check {
|
||||
name = "waypoint-ui-alive"
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "waypoint-api"
|
||||
port = "api"
|
||||
|
||||
check {
|
||||
name = "waypoint-api-alive"
|
||||
type = "tcp"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
volume = "waypoint-data"
|
||||
destination = "/data"
|
||||
read_only = false
|
||||
}
|
||||
}
|
||||
|
||||
volume "waypoint-data" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "waypoint-data"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,47 +0,0 @@
|
|||
# Nomad 完整架构配置
|
||||
# 合并后的inventory文件,基于production目录的最新配置
|
||||
|
||||
[nomad_servers]
|
||||
# 服务器节点 (7个服务器节点)
|
||||
# 本机,不操作 bj-semaphore.global ansible_host=100.116.158.95 ansible_user=root ansible_password=3131 ansible_become_password=3131
|
||||
ash1d.global ansible_host=100.81.26.3 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ash2e.global ansible_host=100.103.147.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch2.global ansible_host=100.90.159.68 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch3.global ansible_host=100.86.141.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
onecloud1.global ansible_host=100.98.209.50 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
de.global ansible_host=100.120.225.29 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[nomad_clients]
|
||||
# 客户端节点 (6个客户端节点,基于production配置)
|
||||
hcp1 ansible_host=hcp1 ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
influxdb1 ansible_host=influxdb1 ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
warden ansible_host=warden ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
browser ansible_host=browser ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
kr-master ansible_host=master ansible_port=60022 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
us-ash3c ansible_host=ash3c ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[nomad_nodes:children]
|
||||
nomad_servers
|
||||
nomad_clients
|
||||
|
||||
[nomad_nodes:vars]
|
||||
# NFS配置
|
||||
nfs_server=snail
|
||||
nfs_share=/fs/1000/nfs/Fnsync
|
||||
mount_point=/mnt/fnsync
|
||||
|
||||
# Ansible配置
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
|
||||
# Telegraf监控配置(基于production配置)
|
||||
client_ip="{{ ansible_host }}"
|
||||
influxdb_url="http://influxdb1.tailnet-68f9.ts.net:8086"
|
||||
influxdb_token="VU_dOCVZzqEHb9jSFsDe0bJlEBaVbiG4LqfoczlnmcbfrbmklSt904HJPL4idYGvVi0c2eHkYDi2zCTni7Ay4w=="
|
||||
influxdb_org="seekkey"
|
||||
influxdb_bucket="VPS"
|
||||
telegraf_config_url="http://influxdb1.tailnet-68f9.ts.net:8086/api/v2/telegrafs/0f8a73496790c000"
|
||||
collection_interval=30
|
||||
disk_usage_warning=80
|
||||
disk_usage_critical=90
|
||||
telegraf_log_level="ERROR"
|
||||
telegraf_disable_local_logs=true
|
||||
|
|
@ -1,60 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "us-ash3c"
|
||||
|
||||
bind_addr = "100.116.80.94"
|
||||
|
||||
addresses {
|
||||
http = "100.116.80.94"
|
||||
rpc = "100.116.80.94"
|
||||
serf = "100.116.80.94"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
# 配置七姐妹服务器地址
|
||||
servers = [
|
||||
"100.116.158.95:4647", # bj-semaphore
|
||||
"100.81.26.3:4647", # ash1d
|
||||
"100.103.147.94:4647", # ash2e
|
||||
"100.90.159.68:4647", # ch2
|
||||
"100.86.141.112:4647", # ch3
|
||||
"100.98.209.50:4647", # bj-onecloud1
|
||||
"100.120.225.29:4647" # de
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,56 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "kr-master"
|
||||
|
||||
bind_addr = "100.117.106.136"
|
||||
|
||||
addresses {
|
||||
http = "100.117.106.136"
|
||||
rpc = "100.117.106.136"
|
||||
serf = "100.117.106.136"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
|
||||
servers = [
|
||||
"100.116.158.95:4647", # semaphore
|
||||
"100.103.147.94:4647", # ash2e
|
||||
"100.81.26.3:4647", # ash1d
|
||||
"100.90.159.68:4647" # ch2
|
||||
]
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,56 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "bj-warden"
|
||||
|
||||
bind_addr = "100.122.197.112"
|
||||
|
||||
addresses {
|
||||
http = "100.122.197.112"
|
||||
rpc = "100.122.197.112"
|
||||
serf = "100.122.197.112"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
|
||||
servers = [
|
||||
"100.116.158.95:4647", # semaphore
|
||||
"100.103.147.94:4647", # ash2e
|
||||
"100.81.26.3:4647", # ash1d
|
||||
"100.90.159.68:4647" # ch2
|
||||
]
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,51 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "us-ash1d"
|
||||
|
||||
bind_addr = "100.81.26.3"
|
||||
|
||||
addresses {
|
||||
http = "100.81.26.3"
|
||||
rpc = "100.81.26.3"
|
||||
serf = "100.81.26.3"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["us-ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"]
|
||||
}
|
||||
|
||||
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,51 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "us-ash2e"
|
||||
|
||||
bind_addr = "100.103.147.94"
|
||||
|
||||
addresses {
|
||||
http = "100.103.147.94"
|
||||
rpc = "100.103.147.94"
|
||||
serf = "100.103.147.94"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["us-ash2e", "ash1d", "ch2", "ch3", "onecloud1", "de"]
|
||||
}
|
||||
|
||||
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,51 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "kr-ch2"
|
||||
|
||||
bind_addr = "100.90.159.68"
|
||||
|
||||
addresses {
|
||||
http = "100.90.159.68"
|
||||
rpc = "100.90.159.68"
|
||||
serf = "100.90.159.68"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
retry_join = ["kr-ch2", "ash1d", "ash2e", "ch3", "onecloud1", "de"]
|
||||
}
|
||||
|
||||
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {#三个节点
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {#三个节点
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,51 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "kr-ch3"
|
||||
|
||||
bind_addr = "100.86.141.112"
|
||||
|
||||
addresses {
|
||||
http = "100.86.141.112"
|
||||
rpc = "100.86.141.112"
|
||||
serf = "100.86.141.112"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
data_dir = "/opt/nomad/data"
|
||||
}
|
||||
|
||||
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {#三个节点
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {#三个节点
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "de"
|
||||
|
||||
bind_addr = "100.120.225.29"
|
||||
|
||||
addresses {
|
||||
http = "100.120.225.29"
|
||||
rpc = "100.120.225.29"
|
||||
serf = "100.120.225.29"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {#三个节点
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {#三个节点
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "onecloud1"
|
||||
|
||||
bind_addr = "100.98.209.50"
|
||||
|
||||
addresses {
|
||||
http = "100.98.209.50"
|
||||
rpc = "100.98.209.50"
|
||||
serf = "100.98.209.50"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,51 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "semaphore"
|
||||
|
||||
bind_addr = "100.116.158.95"
|
||||
|
||||
addresses {
|
||||
http = "100.116.158.95"
|
||||
rpc = "100.116.158.95"
|
||||
serf = "100.116.158.95"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
}
|
||||
|
||||
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1 +0,0 @@
|
|||
components/consul/jobs/
|
||||
|
|
@ -1,37 +0,0 @@
|
|||
# DigitalOcean 密钥存储作业
|
||||
job "digitalocean-key-store" {
|
||||
datacenters = ["dc1"]
|
||||
type = "batch"
|
||||
|
||||
group "key-store" {
|
||||
task "store-key" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/bin/sh"
|
||||
args = [
|
||||
"-c",
|
||||
<<EOT
|
||||
# 将DigitalOcean密钥存储到Consul中
|
||||
curl -X PUT -H "X-Consul-Token: ${CONSUL_HTTP_TOKEN}" \
|
||||
http://127.0.0.1:8500/v1/kv/council/digitalocean/token \
|
||||
-d 'dop_v1_70582bb508873709d96debc7f2a2d04df2093144b2b15fe392dba83b88976376'
|
||||
|
||||
# 验证密钥是否存储成功
|
||||
curl -s http://127.0.0.1:8500/v1/kv/council/digitalocean/token?raw
|
||||
EOT
|
||||
]
|
||||
}
|
||||
|
||||
env {
|
||||
CONSUL_HTTP_ADDR = "http://127.0.0.1:8500"
|
||||
CONSUL_HTTP_TOKEN = "root" # 根据实际Consul配置调整
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 64
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,65 +0,0 @@
|
|||
job "hybrid-nfs-app" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
# 使用约束条件区分存储类型
|
||||
constraint {
|
||||
attribute = "${attr.unique.hostname}"
|
||||
operator = "regexp"
|
||||
value = "semaphore"
|
||||
}
|
||||
|
||||
group "app" {
|
||||
count = 1
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8080
|
||||
}
|
||||
}
|
||||
|
||||
# 对于本机(semaphore)使用host volume
|
||||
volume "local-storage" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "local-fnsync"
|
||||
}
|
||||
|
||||
task "web-app" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "python3"
|
||||
args = ["-m", "http.server", "8080", "--directory", "local/fnsync"]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOH
|
||||
<h1>Hybrid NFS App - Running on {{ env "attr.unique.hostname" }}</h1>
|
||||
<p>Storage Type: {{ with eq (env "attr.unique.hostname") "semaphore" }}PVE Mount{{ else }}NFS{{ end }}</p>
|
||||
<p>Timestamp: {{ now | date "2006-01-02 15:04:05" }}</p>
|
||||
EOH
|
||||
destination = "local/fnsync/index.html"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
|
||||
service {
|
||||
name = "hybrid-nfs-app"
|
||||
port = "http"
|
||||
|
||||
tags = ["hybrid", "nfs", "web"]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,51 +0,0 @@
|
|||
job "nfs-app-example" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "app" {
|
||||
count = 1
|
||||
|
||||
# 使用NFS存储卷
|
||||
volume "nfs-storage" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "nfs-fnsync"
|
||||
}
|
||||
|
||||
task "web-app" {
|
||||
driver = "docker"
|
||||
|
||||
config {
|
||||
image = "nginx:alpine"
|
||||
ports = ["http"]
|
||||
|
||||
# 挂载NFS卷到容器
|
||||
mount {
|
||||
type = "volume"
|
||||
target = "/usr/share/nginx/html"
|
||||
source = "nfs-storage"
|
||||
readonly = false
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
|
||||
service {
|
||||
name = "nfs-web-app"
|
||||
port = "http"
|
||||
|
||||
tags = ["nfs", "web"]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,34 +0,0 @@
|
|||
job "nfs-storage-test" {
|
||||
datacenters = ["dc1"]
|
||||
type = "batch"
|
||||
|
||||
group "test" {
|
||||
count = 1
|
||||
|
||||
volume "nfs-storage" {
|
||||
type = "csi"
|
||||
read_only = false
|
||||
source = "nfs-fnsync"
|
||||
}
|
||||
|
||||
task "storage-test" {
|
||||
driver = "exec"
|
||||
|
||||
volume_mount {
|
||||
volume = "nfs-storage"
|
||||
destination = "/mnt/nfs"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
config {
|
||||
command = "/bin/sh"
|
||||
args = ["-c", "echo 'NFS Storage Test - $(hostname) - $(date)' > /mnt/nfs/test-$(hostname).txt && ls -la /mnt/nfs/"]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 50
|
||||
memory = 64
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1 +0,0 @@
|
|||
components/nomad/jobs/
|
||||
|
|
@ -1,84 +0,0 @@
|
|||
job "nfs-multi-type-example" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
# 为本地LXC容器配置的任务组
|
||||
group "lxc-apps" {
|
||||
count = 2
|
||||
|
||||
constraint {
|
||||
attribute = "${attr.unique.hostname}"
|
||||
operator = "regexp"
|
||||
value = "(influxdb|hcp)"
|
||||
}
|
||||
|
||||
volume "lxc-nfs" {
|
||||
type = "host"
|
||||
source = "nfs-shared"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
task "lxc-app" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "alpine:latest"
|
||||
args = ["tail", "-f", "/dev/null"]
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
volume = "lxc-nfs"
|
||||
destination = "/shared/lxc"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 64
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 为海外PVE容器配置的任务组
|
||||
group "pve-apps" {
|
||||
count = 3
|
||||
|
||||
constraint {
|
||||
attribute = "${attr.unique.hostname}"
|
||||
operator = "regexp"
|
||||
value = "(ash1d|ash2e|ash3c|ch2|ch3)"
|
||||
}
|
||||
|
||||
volume "pve-nfs" {
|
||||
type = "host"
|
||||
source = "nfs-shared"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
task "pve-app" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "alpine:latest"
|
||||
args = ["tail", "-f", "/dev/null"]
|
||||
|
||||
# 为海外节点添加网络优化参数
|
||||
network_mode = "host"
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
volume = "pve-nfs"
|
||||
destination = "/shared/pve"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 64
|
||||
network {
|
||||
mbits = 5
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,86 +0,0 @@
|
|||
job "openfaas-functions" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "hello-world" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "(master|ash3c|hcp)"
|
||||
}
|
||||
|
||||
task "hello-world" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "functions/hello-world:latest"
|
||||
ports = ["http"]
|
||||
env = {
|
||||
"fprocess" = "node index.js"
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" { static = 8080 }
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "hello-world"
|
||||
port = "http"
|
||||
tags = ["openfaas-function"]
|
||||
check {
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "figlet" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "(master|ash3c|hcp)"
|
||||
}
|
||||
|
||||
task "figlet" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "functions/figlet:latest"
|
||||
ports = ["http"]
|
||||
env = {
|
||||
"fprocess" = "figlet"
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" { static = 8080 }
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "figlet"
|
||||
port = "http"
|
||||
tags = ["openfaas-function"]
|
||||
check {
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,176 +0,0 @@
|
|||
job "openfaas" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "openfaas-gateway" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "(master|ash3c|hcp)"
|
||||
}
|
||||
|
||||
task "openfaas-gateway" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "ghcr.io/openfaas/gateway:0.2.35"
|
||||
ports = ["http", "ui"]
|
||||
env = {
|
||||
"functions_provider_url" = "http://${NOMAD_IP_http}:8080"
|
||||
"read_timeout" = "60s"
|
||||
"write_timeout" = "60s"
|
||||
"upstream_timeout" = "60s"
|
||||
"direct_functions" = "true"
|
||||
"faas_nats_address" = "nats://localhost:4222"
|
||||
"faas_nats_streaming" = "true"
|
||||
"basic_auth" = "true"
|
||||
"secret_mount_path" = "/run/secrets"
|
||||
"scale_from_zero" = "true"
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" { static = 8080 }
|
||||
port "ui" { static = 8081 }
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "openfaas-gateway"
|
||||
port = "http"
|
||||
check {
|
||||
type = "http"
|
||||
path = "/healthz"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "nats" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "(master|ash3c|hcp)"
|
||||
}
|
||||
|
||||
task "nats" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "nats-streaming:0.25.3"
|
||||
ports = ["nats"]
|
||||
args = [
|
||||
"-p",
|
||||
"4222",
|
||||
"-m",
|
||||
"8222",
|
||||
"-hbi",
|
||||
"5s",
|
||||
"-hbt",
|
||||
"5s",
|
||||
"-hbf",
|
||||
"2",
|
||||
"-SD",
|
||||
"-cid",
|
||||
"openfaas"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "nats" { static = 4222 }
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "nats"
|
||||
port = "nats"
|
||||
check {
|
||||
type = "tcp"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "queue-worker" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "(master|ash3c|hcp)"
|
||||
}
|
||||
|
||||
task "queue-worker" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "ghcr.io/openfaas/queue-worker:0.12.2"
|
||||
env = {
|
||||
"gateway_url" = "http://${NOMAD_IP_http}:8080"
|
||||
"faas_nats_address" = "nats://localhost:4222"
|
||||
"faas_nats_streaming" = "true"
|
||||
"ack_wait" = "5m"
|
||||
"write_debug" = "true"
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "prometheus" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "(master|ash3c|hcp)"
|
||||
}
|
||||
|
||||
task "prometheus" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "prom/prometheus:v2.35.0"
|
||||
ports = ["prometheus"]
|
||||
volumes = [
|
||||
"/opt/openfaas/prometheus.yml:/etc/prometheus/prometheus.yml"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "prometheus" { static = 9090 }
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "prometheus"
|
||||
port = "prometheus"
|
||||
check {
|
||||
type = "http"
|
||||
path = "/-/healthy"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1,130 +0,0 @@
|
|||
job "traefik" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
update {
|
||||
max_parallel = 1
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "3m"
|
||||
auto_revert = true
|
||||
}
|
||||
|
||||
group "traefik" {
|
||||
count = 1 # 先在warden节点部署一个实例
|
||||
|
||||
# 约束只在warden节点运行
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "bj-warden"
|
||||
}
|
||||
|
||||
restart {
|
||||
attempts = 3
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 80
|
||||
}
|
||||
port "https" {
|
||||
static = 443
|
||||
}
|
||||
port "api" {
|
||||
static = 8080
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
# 下载Traefik v3二进制文件
|
||||
artifact {
|
||||
source = "https://github.com/traefik/traefik/releases/download/v3.1.5/traefik_v3.1.5_linux_amd64.tar.gz"
|
||||
destination = "local/"
|
||||
mode = "file"
|
||||
options {
|
||||
archive = "true"
|
||||
}
|
||||
}
|
||||
|
||||
# 动态配置文件模板
|
||||
template {
|
||||
data = <<EOF
|
||||
# Traefik动态配置 - 从Consul获取服务
|
||||
http:
|
||||
routers:
|
||||
consul-master:
|
||||
rule: "Host(`consul-master.service.consul`)"
|
||||
service: consul-master
|
||||
entryPoints: ["http"]
|
||||
|
||||
services:
|
||||
consul-master:
|
||||
loadBalancer:
|
||||
servers:
|
||||
{{ range nomadService "consul" }}
|
||||
{{ if contains .Tags "http" }}
|
||||
- url: "http://{{ .Address }}:{{ .Port }}"
|
||||
{{ end }}
|
||||
{{ end }}
|
||||
|
||||
# Consul Catalog配置
|
||||
providers:
|
||||
consulCatalog:
|
||||
exposedByDefault: false
|
||||
prefix: "traefik"
|
||||
refreshInterval: 15s
|
||||
endpoint:
|
||||
address: "http://{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
|
||||
connectAware: true
|
||||
connectByDefault: false
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
change_mode = "restart"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "local/traefik"
|
||||
args = [
|
||||
"--configfile=/root/mgmt/infrastructure/routes/traefik.yml",
|
||||
"--providers.file.filename=local/dynamic.yml",
|
||||
"--providers.file.watch=true"
|
||||
]
|
||||
}
|
||||
|
||||
env {
|
||||
NOMAD_ADDR = "http://${attr.unique.network.ip-address}:4646"
|
||||
# Consul地址将通过template动态获取
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 200
|
||||
memory = 256
|
||||
}
|
||||
|
||||
service {
|
||||
name = "traefik-warden"
|
||||
port = "http"
|
||||
tags = [
|
||||
"traefik.enable=true",
|
||||
"traefik.http.routers.traefik-warden.rule=Host(`traefik.warden.consul`)",
|
||||
"traefik.http.routers.traefik-warden.service=api@internal",
|
||||
"traefik.http.routers.traefik-warden.entrypoints=api",
|
||||
"traefik.http.services.traefik-warden.loadbalancer.server.port=8080",
|
||||
"warden"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/ping"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -1 +0,0 @@
|
|||
components/vault/jobs/
|
||||
|
|
@ -1,228 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Nomad 多数据中心节点自动配置脚本
|
||||
# 数据中心: ${datacenter}
|
||||
|
||||
set -e
|
||||
|
||||
# 日志函数
|
||||
log() {
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a /var/log/nomad-setup.log
|
||||
}
|
||||
|
||||
log "开始配置 Nomad 节点 - 数据中心: ${datacenter}"
|
||||
|
||||
# 更新系统
|
||||
log "更新系统包..."
|
||||
apt-get update -y
|
||||
apt-get upgrade -y
|
||||
|
||||
# 安装必要的包
|
||||
log "安装必要的包..."
|
||||
apt-get install -y \
|
||||
curl \
|
||||
wget \
|
||||
unzip \
|
||||
jq \
|
||||
podman \
|
||||
htop \
|
||||
net-tools \
|
||||
vim
|
||||
|
||||
# 启动 Podman
|
||||
log "启动 Podman 服务..."
|
||||
systemctl enable podman
|
||||
systemctl start podman
|
||||
usermod -aG podman ubuntu
|
||||
|
||||
# 安装 Nomad
|
||||
log "安装 Nomad ${nomad_version}..."
|
||||
cd /tmp
|
||||
wget -q https://releases.hashicorp.com/nomad/${nomad_version}/nomad_${nomad_version}_linux_amd64.zip
|
||||
unzip nomad_${nomad_version}_linux_amd64.zip
|
||||
mv nomad /usr/local/bin/
|
||||
chmod +x /usr/local/bin/nomad
|
||||
|
||||
# 创建 Nomad 用户和目录
|
||||
log "创建 Nomad 用户和目录..."
|
||||
useradd --system --home /etc/nomad.d --shell /bin/false nomad
|
||||
mkdir -p /opt/nomad/data
|
||||
mkdir -p /etc/nomad.d
|
||||
mkdir -p /var/log/nomad
|
||||
chown -R nomad:nomad /opt/nomad /etc/nomad.d /var/log/nomad
|
||||
|
||||
# 获取本机 IP 地址
|
||||
if [ "${bind_addr}" = "auto" ]; then
|
||||
# 尝试多种方法获取 IP
|
||||
BIND_ADDR=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4 2>/dev/null || \
|
||||
curl -s http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip -H "Metadata-Flavor: Google" 2>/dev/null || \
|
||||
ip route get 8.8.8.8 | awk '{print $7; exit}' || \
|
||||
hostname -I | awk '{print $1}')
|
||||
else
|
||||
BIND_ADDR="${bind_addr}"
|
||||
fi
|
||||
|
||||
log "检测到 IP 地址: $BIND_ADDR"
|
||||
|
||||
# 创建 Nomad 配置文件
|
||||
log "创建 Nomad 配置文件..."
|
||||
cat > /etc/nomad.d/nomad.hcl << EOF
|
||||
datacenter = "${datacenter}"
|
||||
region = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
|
||||
bind_addr = "$BIND_ADDR"
|
||||
|
||||
%{ if server_enabled }
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = ${bootstrap_expect}
|
||||
encrypt = "${nomad_encrypt_key}"
|
||||
}
|
||||
%{ endif }
|
||||
|
||||
%{ if client_enabled }
|
||||
client {
|
||||
enabled = true
|
||||
|
||||
host_volume "podman-sock" {
|
||||
path = "/run/podman/podman.sock"
|
||||
read_only = false
|
||||
}
|
||||
}
|
||||
%{ endif }
|
||||
|
||||
ui {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
addresses {
|
||||
http = "0.0.0.0"
|
||||
rpc = "$BIND_ADDR"
|
||||
serf = "$BIND_ADDR"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
plugin "podman" {
|
||||
config {
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
telemetry {
|
||||
collection_interval = "10s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
|
||||
log_level = "INFO"
|
||||
log_file = "/var/log/nomad/nomad.log"
|
||||
EOF
|
||||
|
||||
# 创建 systemd 服务文件
|
||||
log "创建 systemd 服务文件..."
|
||||
cat > /etc/systemd/system/nomad.service << EOF
|
||||
[Unit]
|
||||
Description=Nomad
|
||||
Documentation=https://www.nomadproject.io/
|
||||
Requires=network-online.target
|
||||
After=network-online.target
|
||||
ConditionFileNotEmpty=/etc/nomad.d/nomad.hcl
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
User=nomad
|
||||
Group=nomad
|
||||
ExecStart=/usr/local/bin/nomad agent -config=/etc/nomad.d/nomad.hcl
|
||||
ExecReload=/bin/kill -HUP \$MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
LimitNOFILE=65536
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
# 启动 Nomad 服务
|
||||
log "启动 Nomad 服务..."
|
||||
systemctl daemon-reload
|
||||
systemctl enable nomad
|
||||
systemctl start nomad
|
||||
|
||||
# 等待服务启动
|
||||
log "等待 Nomad 服务启动..."
|
||||
sleep 10
|
||||
|
||||
# 验证安装
|
||||
log "验证 Nomad 安装..."
|
||||
if systemctl is-active --quiet nomad; then
|
||||
log "✅ Nomad 服务运行正常"
|
||||
log "📊 节点信息:"
|
||||
/usr/local/bin/nomad node status -self || true
|
||||
else
|
||||
log "❌ Nomad 服务启动失败"
|
||||
systemctl status nomad --no-pager || true
|
||||
journalctl -u nomad --no-pager -n 20 || true
|
||||
fi
|
||||
|
||||
# 配置防火墙(如果需要)
|
||||
log "配置防火墙规则..."
|
||||
if command -v ufw >/dev/null 2>&1; then
|
||||
ufw allow 4646/tcp # HTTP API
|
||||
ufw allow 4647/tcp # RPC
|
||||
ufw allow 4648/tcp # Serf
|
||||
ufw allow 22/tcp # SSH
|
||||
fi
|
||||
|
||||
# 创建有用的别名和脚本
|
||||
log "创建管理脚本..."
|
||||
cat > /usr/local/bin/nomad-status << 'EOF'
|
||||
#!/bin/bash
|
||||
echo "=== Nomad 服务状态 ==="
|
||||
systemctl status nomad --no-pager
|
||||
|
||||
echo -e "\n=== Nomad 集群成员 ==="
|
||||
nomad server members 2>/dev/null || echo "无法连接到集群"
|
||||
|
||||
echo -e "\n=== Nomad 节点状态 ==="
|
||||
nomad node status 2>/dev/null || echo "无法获取节点状态"
|
||||
|
||||
echo -e "\n=== 最近日志 ==="
|
||||
journalctl -u nomad --no-pager -n 5
|
||||
EOF
|
||||
|
||||
chmod +x /usr/local/bin/nomad-status
|
||||
|
||||
# 添加到 ubuntu 用户的 bashrc
|
||||
echo 'alias ns="nomad-status"' >> /home/ubuntu/.bashrc
|
||||
echo 'alias nomad-logs="journalctl -u nomad -f"' >> /home/ubuntu/.bashrc
|
||||
|
||||
log "🎉 Nomad 节点配置完成!"
|
||||
log "📍 数据中心: ${datacenter}"
|
||||
log "🌐 IP 地址: $BIND_ADDR"
|
||||
log "🔗 Web UI: http://$BIND_ADDR:4646"
|
||||
log "📝 使用 'nomad-status' 或 'ns' 命令查看状态"
|
||||
|
||||
# 输出重要信息到 motd
|
||||
cat > /etc/update-motd.d/99-nomad << EOF
|
||||
#!/bin/bash
|
||||
echo ""
|
||||
echo "🚀 Nomad 节点信息:"
|
||||
echo " 数据中心: ${datacenter}"
|
||||
echo " IP 地址: $BIND_ADDR"
|
||||
echo " Web UI: http://$BIND_ADDR:4646"
|
||||
echo " 状态检查: nomad-status"
|
||||
echo ""
|
||||
EOF
|
||||
|
||||
chmod +x /etc/update-motd.d/99-nomad
|
||||
|
||||
log "节点配置脚本执行完成"
|
||||
|
|
@ -1,228 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Nomad 多数据中心节点自动配置脚本
|
||||
# 数据中心: ${datacenter}
|
||||
|
||||
set -e
|
||||
|
||||
# 日志函数
|
||||
log() {
|
||||
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a /var/log/nomad-setup.log
|
||||
}
|
||||
|
||||
log "开始配置 Nomad 节点 - 数据中心: ${datacenter}"
|
||||
|
||||
# 更新系统
|
||||
log "更新系统包..."
|
||||
apt-get update -y
|
||||
apt-get upgrade -y
|
||||
|
||||
# 安装必要的包
|
||||
log "安装必要的包..."
|
||||
apt-get install -y \
|
||||
curl \
|
||||
wget \
|
||||
unzip \
|
||||
jq \
|
||||
podman \
|
||||
htop \
|
||||
net-tools \
|
||||
vim
|
||||
|
||||
# 启动 Podman
|
||||
log "启动 Podman 服务..."
|
||||
systemctl enable podman
|
||||
systemctl start podman
|
||||
usermod -aG podman ubuntu
|
||||
|
||||
# 安装 Nomad
|
||||
log "安装 Nomad ${nomad_version}..."
|
||||
cd /tmp
|
||||
wget -q https://releases.hashicorp.com/nomad/${nomad_version}/nomad_${nomad_version}_linux_amd64.zip
|
||||
unzip nomad_${nomad_version}_linux_amd64.zip
|
||||
mv nomad /usr/local/bin/
|
||||
chmod +x /usr/local/bin/nomad
|
||||
|
||||
# 创建 Nomad 用户和目录
|
||||
log "创建 Nomad 用户和目录..."
|
||||
useradd --system --home /etc/nomad.d --shell /bin/false nomad
|
||||
mkdir -p /opt/nomad/data
|
||||
mkdir -p /etc/nomad.d
|
||||
mkdir -p /var/log/nomad
|
||||
chown -R nomad:nomad /opt/nomad /etc/nomad.d /var/log/nomad
|
||||
|
||||
# 获取本机 IP 地址
|
||||
if [ "${bind_addr}" = "auto" ]; then
|
||||
# 尝试多种方法获取 IP
|
||||
BIND_ADDR=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4 2>/dev/null || \
|
||||
curl -s http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip -H "Metadata-Flavor: Google" 2>/dev/null || \
|
||||
ip route get 8.8.8.8 | awk '{print $7; exit}' || \
|
||||
hostname -I | awk '{print $1}')
|
||||
else
|
||||
BIND_ADDR="${bind_addr}"
|
||||
fi
|
||||
|
||||
log "检测到 IP 地址: $BIND_ADDR"
|
||||
|
||||
# 创建 Nomad 配置文件
|
||||
log "创建 Nomad 配置文件..."
|
||||
cat > /etc/nomad.d/nomad.hcl << EOF
|
||||
datacenter = "${datacenter}"
|
||||
region = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
|
||||
bind_addr = "$BIND_ADDR"
|
||||
|
||||
%{ if server_enabled }
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = ${bootstrap_expect}
|
||||
encrypt = "${nomad_encrypt_key}"
|
||||
}
|
||||
%{ endif }
|
||||
|
||||
%{ if client_enabled }
|
||||
client {
|
||||
enabled = true
|
||||
|
||||
host_volume "podman-sock" {
|
||||
path = "/run/podman/podman.sock"
|
||||
read_only = false
|
||||
}
|
||||
}
|
||||
%{ endif }
|
||||
|
||||
ui {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
addresses {
|
||||
http = "0.0.0.0"
|
||||
rpc = "$BIND_ADDR"
|
||||
serf = "$BIND_ADDR"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
plugin "podman" {
|
||||
config {
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
telemetry {
|
||||
collection_interval = "10s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
|
||||
log_level = "INFO"
|
||||
log_file = "/var/log/nomad/nomad.log"
|
||||
EOF
|
||||
|
||||
# 创建 systemd 服务文件
|
||||
log "创建 systemd 服务文件..."
|
||||
cat > /etc/systemd/system/nomad.service << EOF
|
||||
[Unit]
|
||||
Description=Nomad
|
||||
Documentation=https://www.nomadproject.io/
|
||||
Requires=network-online.target
|
||||
After=network-online.target
|
||||
ConditionFileNotEmpty=/etc/nomad.d/nomad.hcl
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
User=nomad
|
||||
Group=nomad
|
||||
ExecStart=/usr/local/bin/nomad agent -config=/etc/nomad.d/nomad.hcl
|
||||
ExecReload=/bin/kill -HUP \$MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
LimitNOFILE=65536
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
# 启动 Nomad 服务
|
||||
log "启动 Nomad 服务..."
|
||||
systemctl daemon-reload
|
||||
systemctl enable nomad
|
||||
systemctl start nomad
|
||||
|
||||
# 等待服务启动
|
||||
log "等待 Nomad 服务启动..."
|
||||
sleep 10
|
||||
|
||||
# 验证安装
|
||||
log "验证 Nomad 安装..."
|
||||
if systemctl is-active --quiet nomad; then
|
||||
log "✅ Nomad 服务运行正常"
|
||||
log "📊 节点信息:"
|
||||
/usr/local/bin/nomad node status -self || true
|
||||
else
|
||||
log "❌ Nomad 服务启动失败"
|
||||
systemctl status nomad --no-pager || true
|
||||
journalctl -u nomad --no-pager -n 20 || true
|
||||
fi
|
||||
|
||||
# 配置防火墙(如果需要)
|
||||
log "配置防火墙规则..."
|
||||
if command -v ufw >/dev/null 2>&1; then
|
||||
ufw allow 4646/tcp # HTTP API
|
||||
ufw allow 4647/tcp # RPC
|
||||
ufw allow 4648/tcp # Serf
|
||||
ufw allow 22/tcp # SSH
|
||||
fi
|
||||
|
||||
# 创建有用的别名和脚本
|
||||
log "创建管理脚本..."
|
||||
cat > /usr/local/bin/nomad-status << 'EOF'
|
||||
#!/bin/bash
|
||||
echo "=== Nomad 服务状态 ==="
|
||||
systemctl status nomad --no-pager
|
||||
|
||||
echo -e "\n=== Nomad 集群成员 ==="
|
||||
nomad server members 2>/dev/null || echo "无法连接到集群"
|
||||
|
||||
echo -e "\n=== Nomad 节点状态 ==="
|
||||
nomad node status 2>/dev/null || echo "无法获取节点状态"
|
||||
|
||||
echo -e "\n=== 最近日志 ==="
|
||||
journalctl -u nomad --no-pager -n 5
|
||||
EOF
|
||||
|
||||
chmod +x /usr/local/bin/nomad-status
|
||||
|
||||
# 添加到 ubuntu 用户的 bashrc
|
||||
echo 'alias ns="nomad-status"' >> /home/ubuntu/.bashrc
|
||||
echo 'alias nomad-logs="journalctl -u nomad -f"' >> /home/ubuntu/.bashrc
|
||||
|
||||
log "🎉 Nomad 节点配置完成!"
|
||||
log "📍 数据中心: ${datacenter}"
|
||||
log "🌐 IP 地址: $BIND_ADDR"
|
||||
log "🔗 Web UI: http://$BIND_ADDR:4646"
|
||||
log "📝 使用 'nomad-status' 或 'ns' 命令查看状态"
|
||||
|
||||
# 输出重要信息到 motd
|
||||
cat > /etc/update-motd.d/99-nomad << EOF
|
||||
#!/bin/bash
|
||||
echo ""
|
||||
echo "🚀 Nomad 节点信息:"
|
||||
echo " 数据中心: ${datacenter}"
|
||||
echo " IP 地址: $BIND_ADDR"
|
||||
echo " Web UI: http://$BIND_ADDR:4646"
|
||||
echo " 状态检查: nomad-status"
|
||||
echo ""
|
||||
EOF
|
||||
|
||||
chmod +x /etc/update-motd.d/99-nomad
|
||||
|
||||
log "节点配置脚本执行完成"
|
||||
|
|
@ -1,54 +0,0 @@
|
|||
# Traefik静态配置文件
|
||||
global:
|
||||
sendAnonymousUsage: false
|
||||
|
||||
# API和仪表板配置
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true # 仅用于测试,生产环境应使用安全配置
|
||||
|
||||
# 入口点配置
|
||||
entryPoints:
|
||||
http:
|
||||
address: ":80"
|
||||
# 重定向HTTP到HTTPS
|
||||
http:
|
||||
redirections:
|
||||
entryPoint:
|
||||
to: https
|
||||
scheme: https
|
||||
https:
|
||||
address: ":443"
|
||||
api:
|
||||
address: ":8080"
|
||||
|
||||
# 提供者配置
|
||||
providers:
|
||||
# 启用文件提供者用于动态配置
|
||||
file:
|
||||
directory: "/etc/traefik/dynamic"
|
||||
watch: true
|
||||
|
||||
# Nomad提供者 - 使用静态地址因为Nomad API相对稳定
|
||||
nomad:
|
||||
exposedByDefault: false
|
||||
prefix: "traefik"
|
||||
refreshInterval: 15s
|
||||
stale: false
|
||||
watch: true
|
||||
endpoint:
|
||||
address: "http://127.0.0.1:4646"
|
||||
scheme: "http"
|
||||
allowEmptyServices: true
|
||||
|
||||
# 日志配置
|
||||
log:
|
||||
level: "INFO"
|
||||
format: "json"
|
||||
|
||||
accessLog:
|
||||
format: "json"
|
||||
fields:
|
||||
defaultMode: "keep"
|
||||
headers:
|
||||
defaultMode: "keep"
|
||||
|
|
@ -1,294 +0,0 @@
|
|||
# LXC 容器浏览器自动化环境配置方案
|
||||
|
||||
## 1. LXC 容器基础配置
|
||||
|
||||
```bash
|
||||
# 创建 Ubuntu 22.04 基础容器
|
||||
lxc launch ubuntu:22.04 chrome-automation
|
||||
|
||||
# 配置容器资源限制
|
||||
lxc config set chrome-automation limits.cpu 2
|
||||
lxc config set chrome-automation limits.memory 4GB
|
||||
|
||||
# 映射端口(如果需要外部访问)
|
||||
lxc config device add chrome-automation proxy-port8080 proxy listen=tcp:0.0.0.0:8080 connect=tcp:127.0.0.1:8080
|
||||
```
|
||||
|
||||
## 2. 容器内环境配置
|
||||
|
||||
### 2.1 基础系统包安装
|
||||
```bash
|
||||
# 进入容器
|
||||
lxc exec chrome-automation -- bash
|
||||
|
||||
# 更新系统
|
||||
apt update && apt upgrade -y
|
||||
|
||||
# 安装基础开发工具和图形支持
|
||||
apt install -y \
|
||||
curl \
|
||||
wget \
|
||||
unzip \
|
||||
git \
|
||||
build-essential \
|
||||
xvfb \
|
||||
x11-utils \
|
||||
x11-xserver-utils \
|
||||
xdg-utils \
|
||||
libnss3 \
|
||||
libatk-bridge2.0-0 \
|
||||
libdrm2 \
|
||||
libxkbcommon0 \
|
||||
libxcomposite1 \
|
||||
libxdamage1 \
|
||||
libxrandr2 \
|
||||
libgbm1 \
|
||||
libxss1 \
|
||||
libasound2 \
|
||||
fonts-liberation \
|
||||
libappindicator3-1 \
|
||||
xdg-utils \
|
||||
libsecret-1-dev \
|
||||
libgconf-2-4
|
||||
```
|
||||
|
||||
### 2.2 安装 Chrome 浏览器
|
||||
```bash
|
||||
# 下载并安装 Google Chrome
|
||||
wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add -
|
||||
echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
|
||||
apt update
|
||||
apt install -y google-chrome-stable
|
||||
```
|
||||
|
||||
### 2.3 安装浏览器自动化工具
|
||||
```bash
|
||||
# 安装 Node.js 和 npm
|
||||
curl -fsSL https://deb.nodesource.com/setup_18.x | bash -
|
||||
apt install -y nodejs
|
||||
|
||||
# 安装 Python 和相关工具
|
||||
apt install -y python3 python3-pip python3-venv
|
||||
|
||||
# 安装 Selenium 和浏览器驱动
|
||||
pip3 install selenium webdriver-manager
|
||||
|
||||
# 下载 ChromeDriver
|
||||
npm install -g chromedriver
|
||||
```
|
||||
|
||||
### 2.4 配置无头模式运行环境
|
||||
```bash
|
||||
# 创建自动化脚本目录
|
||||
mkdir -p /opt/browser-automation
|
||||
cd /opt/browser-automation
|
||||
|
||||
# 创建 Chrome 无头模式启动脚本
|
||||
cat > chrome-headless.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
export DISPLAY=:99
|
||||
Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &
|
||||
sleep 2
|
||||
google-chrome --headless --no-sandbox --disable-dev-shm-usage --disable-gpu --remote-debugging-port=9222 --disable-extensions --disable-plugins --disable-images &
|
||||
sleep 3
|
||||
EOF
|
||||
|
||||
chmod +x chrome-headless.sh
|
||||
```
|
||||
|
||||
## 3. 自动化工具配置
|
||||
|
||||
### 3.1 Python Selenium 配置示例
|
||||
```python
|
||||
# selenium_automation.py
|
||||
from selenium import webdriver
|
||||
from selenium.webdriver.chrome.options import Options
|
||||
from selenium.webdriver.chrome.service import Service
|
||||
from webdriver_manager.chrome import ChromeDriverManager
|
||||
|
||||
def create_chrome_driver():
|
||||
chrome_options = Options()
|
||||
chrome_options.add_argument("--headless")
|
||||
chrome_options.add_argument("--no-sandbox")
|
||||
chrome_options.add_argument("--disable-dev-shm-usage")
|
||||
chrome_options.add_argument("--disable-gpu")
|
||||
chrome_options.add_argument("--remote-debugging-port=9222")
|
||||
chrome_options.add_argument("--disable-extensions")
|
||||
chrome_options.add_argument("--disable-plugins")
|
||||
chrome_options.add_argument("--window-size=1920,1080")
|
||||
|
||||
service = Service(ChromeDriverManager().install())
|
||||
driver = webdriver.Chrome(service=service, options=chrome_options)
|
||||
return driver
|
||||
|
||||
# 使用示例
|
||||
driver = create_chrome_driver()
|
||||
driver.get("https://www.example.com")
|
||||
print(driver.title)
|
||||
driver.quit()
|
||||
```
|
||||
|
||||
### 3.2 Node.js Puppeteer 配置示例
|
||||
```javascript
|
||||
// puppeteer_automation.js
|
||||
const puppeteer = require('puppeteer');
|
||||
|
||||
async function runAutomation() {
|
||||
const browser = await puppeteer.launch({
|
||||
headless: true,
|
||||
args: [
|
||||
'--no-sandbox',
|
||||
'--disable-setuid-sandbox',
|
||||
'--disable-dev-shm-usage',
|
||||
'--disable-gpu',
|
||||
'--window-size=1920,1080'
|
||||
]
|
||||
});
|
||||
|
||||
const page = await browser.newPage();
|
||||
await page.goto('https://www.example.com');
|
||||
const title = await page.title();
|
||||
console.log(title);
|
||||
|
||||
await browser.close();
|
||||
}
|
||||
|
||||
runAutomation();
|
||||
```
|
||||
|
||||
## 4. 容器启动配置
|
||||
|
||||
### 4.1 启动脚本
|
||||
```bash
|
||||
cat > /opt/browser-automation/start.sh << 'EOF'
|
||||
#!/bin/bash
|
||||
|
||||
# 启动 Xvfb 虚拟显示
|
||||
export DISPLAY=:99
|
||||
Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 &
|
||||
sleep 2
|
||||
|
||||
# 启动 Chrome 浏览器
|
||||
google-chrome --headless --no-sandbox --disable-dev-shm-usage --disable-gpu --remote-debugging-port=9222 --disable-extensions --disable-plugins --disable-images &
|
||||
sleep 3
|
||||
|
||||
# 可选:启动自动化服务
|
||||
# python3 /opt/browser-automation/service.py
|
||||
|
||||
echo "Browser automation environment ready!"
|
||||
EOF
|
||||
|
||||
chmod +x /opt/browser-automation/start.sh
|
||||
```
|
||||
|
||||
### 4.2 系统服务配置
|
||||
```bash
|
||||
cat > /etc/systemd/system/browser-automation.service << 'EOF'
|
||||
[Unit]
|
||||
Description=Browser Automation Service
|
||||
After=network.target
|
||||
|
||||
[Service]
|
||||
Type=forking
|
||||
ExecStart=/opt/browser-automation/start.sh
|
||||
Restart=always
|
||||
User=root
|
||||
Environment=DISPLAY=:99
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
EOF
|
||||
|
||||
systemctl enable browser-automation.service
|
||||
```
|
||||
|
||||
## 5. 安全配置
|
||||
|
||||
### 5.1 非 root 用户配置
|
||||
```bash
|
||||
# 创建专用用户
|
||||
useradd -m -s /bin/bash browser-user
|
||||
usermod -a -G sudo browser-user
|
||||
|
||||
# 设置 Chrome 以非 root 用户运行
|
||||
echo 'chrome --no-sandbox --user-data-dir=/home/browser-user/.config/google-chrome' > /home/browser-user/run-chrome.sh
|
||||
chown browser-user:browser-user /home/browser-user/run-chrome.sh
|
||||
```
|
||||
|
||||
### 5.2 网络安全
|
||||
```bash
|
||||
# 配置防火墙(如果需要)
|
||||
ufw allow 22/tcp
|
||||
# 仅在需要外部访问时开放特定端口
|
||||
# ufw allow 8080/tcp
|
||||
```
|
||||
|
||||
## 6. 监控和日志
|
||||
|
||||
### 6.1 日志配置
|
||||
```bash
|
||||
# 创建日志目录
|
||||
mkdir -p /var/log/browser-automation
|
||||
|
||||
# 配置日志轮转
|
||||
cat > /etc/logrotate.d/browser-automation << 'EOF'
|
||||
/var/log/browser-automation/*.log {
|
||||
daily
|
||||
missingok
|
||||
rotate 30
|
||||
compress
|
||||
delaycompress
|
||||
notifempty
|
||||
create 644 root root
|
||||
}
|
||||
EOF
|
||||
```
|
||||
|
||||
## 7. 备份和恢复
|
||||
|
||||
### 7.1 创建容器快照
|
||||
```bash
|
||||
# 创建快照
|
||||
lxc snapshot chrome-automation initial-setup
|
||||
|
||||
# 列出快照
|
||||
lxc info chrome-automation --snapshots
|
||||
|
||||
# 恢复快照
|
||||
lxc restore chrome-automation initial-setup
|
||||
```
|
||||
|
||||
### 7.2 配置文件备份
|
||||
```bash
|
||||
# 备份重要配置
|
||||
lxc file pull chrome-automation/etc/systemd/system/browser-automation.service ./
|
||||
lxc file pull chrome-automation/opt/browser-automation/start.sh ./
|
||||
```
|
||||
|
||||
## 8. 性能优化
|
||||
|
||||
### 8.1 Chrome 启动参数优化
|
||||
```bash
|
||||
CHROME_OPTS="--headless \
|
||||
--no-sandbox \
|
||||
--disable-dev-shm-usage \
|
||||
--disable-gpu \
|
||||
--remote-debugging-port=9222 \
|
||||
--disable-extensions \
|
||||
--disable-plugins \
|
||||
--disable-images \
|
||||
--disable-javascript \
|
||||
--memory-pressure-off \
|
||||
--max_old_space_size=4096 \
|
||||
--js-flags=--max-old-space-size=2048"
|
||||
```
|
||||
|
||||
### 8.2 容器资源优化
|
||||
```bash
|
||||
# 在容器配置中设置资源限制
|
||||
lxc config set chrome-automation limits.cpu 2
|
||||
lxc config set chrome-automation limits.memory 4GB
|
||||
lxc config set chrome-automation limits.memory.swap false
|
||||
```
|
||||
|
||||
这个配置方案提供了完整的LXC容器环境,专门用于浏览器自动化任务,具有良好的性能、安全性和可维护性。
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "semaphore"
|
||||
|
||||
bind_addr = "192.168.31.149"
|
||||
|
||||
addresses {
|
||||
http = "192.168.31.149"
|
||||
rpc = "192.168.31.149"
|
||||
serf = "192.168.31.149"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
retry_join = ["semaphore", "ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "master:8500,ash3c:8500,warden:8500"
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://master:8200,http://ash3c:8200,http://warden:8200"
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ch3"
|
||||
|
||||
bind_addr = "100.116.158.95"
|
||||
|
||||
addresses {
|
||||
http = "100.116.158.95"
|
||||
rpc = "100.116.158.95"
|
||||
serf = "100.116.158.95"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
retry_join = ["ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "master:8500,ash3c:8500,warden:8500"
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://master:8200,http://ash3c:8200,http://warden:8200"
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ch3"
|
||||
|
||||
bind_addr = "100.86.141.112"
|
||||
|
||||
addresses {
|
||||
http = "100.86.141.112"
|
||||
rpc = "100.86.141.112"
|
||||
serf = "100.86.141.112"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
retry_join = ["100.81.26.3", "100.103.147.94", "100.90.159.68", "100.86.141.112", "100.98.209.50", "100.120.225.29"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -1,56 +0,0 @@
|
|||
# Nomad过期客户端节点处理最终报告
|
||||
|
||||
## 概述
|
||||
根据您的要求,我们已经对Nomad集群中三个过期的客户端节点进行了处理。这些节点处于"down"状态,我们采取了多项措施来加速它们的移除。
|
||||
|
||||
## 已处理的节点
|
||||
1. **bj-semaphore** (ID: fa91f05f)
|
||||
2. **kr-ch2** (ID: 369f60be)
|
||||
3. **kr-ch3** (ID: 3bd9e893)
|
||||
|
||||
## 已执行操作总结
|
||||
1. **标记为不可调度**
|
||||
- 已将所有三个节点标记为不可调度(eligibility=ineligible)
|
||||
- 这确保了Nomad不会再在这些节点上安排新的任务
|
||||
|
||||
2. **强制排水操作**
|
||||
- 对所有三个节点执行了强制排水操作
|
||||
- 命令: `nomad node drain -address=http://100.86.141.112:4646 -enable -force <node-id>`
|
||||
- 结果: 所有节点的排水操作都已完成
|
||||
|
||||
3. **API删除尝试**
|
||||
- 尝试通过Nomad API直接删除节点
|
||||
- 使用curl命令发送DELETE请求到Nomad API
|
||||
|
||||
4. **服务器节点重启**
|
||||
- 重启了部分Nomad服务器节点以强制重新评估集群状态
|
||||
- 重启的节点: ash1d.global.global, ch2.global.global
|
||||
- 集群保持稳定,没有出现服务中断
|
||||
|
||||
## 当前状态
|
||||
尽管采取了上述措施,这些节点仍然显示在节点列表中,但状态已更新为不可调度且已完成排水:
|
||||
```
|
||||
ID Node Pool DC Name Class Drain Eligibility Status
|
||||
369f60be default dc1 kr-ch2 <none> false ineligible down
|
||||
3bd9e893 default dc1 kr-ch3 <none> false ineligible down
|
||||
fa91f05f default dc1 bj-semaphore <none> false ineligible down
|
||||
```
|
||||
|
||||
## 分析与建议
|
||||
### 为什么节点仍未被移除?
|
||||
1. Nomad默认会在72小时后自动清理down状态的节点
|
||||
2. 这些节点可能在后端存储(如本地磁盘或Consul)中仍有状态信息
|
||||
3. 由于它们已经处于down状态且被标记为不可调度,不会对集群造成影响
|
||||
|
||||
### 进一步建议
|
||||
1. **等待自动清理**: 最安全的方法是等待Nomad自动清理这些节点(默认72小时)
|
||||
2. **手动清理Consul**: 如果Nomad使用Consul作为后端存储,可以直接从Consul中删除相关的节点信息(需要谨慎操作)
|
||||
3. **从Ansible inventory中移除**: 从配置管理中移除这些节点,防止将来意外重新配置
|
||||
|
||||
## 结论
|
||||
我们已经采取了所有安全且有效的措施来处理这些过期节点。目前它们已被标记为不可调度且已完成排水,不会对集群造成任何影响。建议等待Nomad自动清理这些节点,或者如果确实需要立即移除,可以从Ansible inventory中移除这些节点定义。
|
||||
|
||||
## 后续步骤
|
||||
1. 监控集群状态,确保这些节点不会对集群造成影响
|
||||
2. 如果在接下来的几天内这些节点仍未被自动清理,可以考虑更激进的手动清理方法
|
||||
3. 更新相关文档,记录这些节点已被退役
|
||||
|
|
@ -1,54 +0,0 @@
|
|||
# Nomad过期客户端节点处理总结
|
||||
|
||||
## 任务目标
|
||||
移除Nomad集群中三个已过期的客户端节点:
|
||||
1. bj-semaphore (ID: fa91f05f)
|
||||
2. kr-ch2 (ID: 369f60be)
|
||||
3. kr-ch3 (ID: 3bd9e893)
|
||||
|
||||
## 已完成操作
|
||||
|
||||
### 1. 标记节点为不可调度
|
||||
```
|
||||
nomad node eligibility -address=http://100.86.141.112:4646 -disable fa91f05f
|
||||
nomad node eligibility -address=http://100.86.141.112:4646 -disable 369f60be
|
||||
nomad node eligibility -address=http://100.86.141.112:4646 -disable 3bd9e893
|
||||
```
|
||||
|
||||
### 2. 强制排水操作
|
||||
```
|
||||
nomad node drain -address=http://100.86.141.112:4646 -enable -force fa91f05f
|
||||
nomad node drain -address=http://100.86.141.112:4646 -enable -force 369f60be
|
||||
nomad node drain -address=http://100.86.141.112:4646 -enable -force 3bd9e893
|
||||
```
|
||||
|
||||
### 3. API删除尝试
|
||||
```
|
||||
curl -X DELETE http://100.86.141.112:4646/v1/node/fa91f05f-80d7-1b10-a879-a54ba2fb943f
|
||||
curl -X DELETE http://100.86.141.112:4646/v1/node/369f60be-2640-93f2-94f5-fe95907d0462
|
||||
curl -X DELETE http://100.86.141.112:4646/v1/node/3bd9e893-aef4-b732-6c07-63739601ccde
|
||||
```
|
||||
|
||||
### 4. 服务器节点重启
|
||||
- 重启了 ash1d.global.global 节点
|
||||
- 重启了 ch2.global.global 节点
|
||||
- 集群保持稳定运行
|
||||
|
||||
### 5. 配置管理更新
|
||||
- 从Ansible inventory文件中注释掉了过期节点:
|
||||
- ch2 (kr-ch2)
|
||||
- ch3 (kr-ch3)
|
||||
- semaphoressh (bj-semaphore)
|
||||
|
||||
## 当前状态
|
||||
节点仍然显示在Nomad集群节点列表中,但已被标记为不可调度且已完成排水,不会对集群造成影响。
|
||||
|
||||
## 后续建议
|
||||
1. 等待Nomad自动清理(默认72小时后)
|
||||
2. 监控集群状态确保正常运行
|
||||
3. 如有需要,可考虑更激进的手动清理方法
|
||||
|
||||
## 相关文档
|
||||
- 详细操作报告: nomad_expired_nodes_final_report.md
|
||||
- 重启备份计划: nomad_restart_backup_plan.md
|
||||
- 移除操作报告: nomad_expired_nodes_removal_report.md
|
||||
|
|
@ -1,45 +0,0 @@
|
|||
# Nomad过期客户端节点处理报告
|
||||
|
||||
## 概述
|
||||
根据您的要求,已处理Nomad集群中三个过期的客户端节点。这些节点处于"down"状态,我们已经采取了多项措施来加速它们的移除。
|
||||
|
||||
## 已处理的节点
|
||||
1. **bj-semaphore** (ID: fa91f05f)
|
||||
2. **kr-ch2** (ID: 369f60be)
|
||||
3. **kr-ch3** (ID: 3bd9e893)
|
||||
|
||||
## 已执行操作
|
||||
1. 已将所有三个节点标记为不可调度(eligibility=ineligible)
|
||||
- 这确保了Nomad不会再在这些节点上安排新的任务
|
||||
- 命令: `nomad node eligibility -address=http://100.86.141.112:4646 -disable <node-id>`
|
||||
|
||||
2. 对所有三个节点执行了强制排水操作
|
||||
- 命令: `nomad node drain -address=http://100.86.141.112:4646 -enable -force <node-id>`
|
||||
- 结果: 所有节点的排水操作都已完成
|
||||
|
||||
3. 尝试通过API直接删除节点
|
||||
- 使用curl命令发送DELETE请求到Nomad API
|
||||
- 命令: `curl -X DELETE http://100.86.141.112:4646/v1/node/<node-id>`
|
||||
|
||||
## 当前状态
|
||||
节点仍然显示在列表中,但状态已更新:
|
||||
```
|
||||
ID Node Pool DC Name Class Drain Eligibility Status
|
||||
369f60be default dc1 kr-ch2 <none> false ineligible down
|
||||
3bd9e893 default dc1 kr-ch3 <none> false ineligible down
|
||||
fa91f05f default dc1 bj-semaphore <none> false ineligible down
|
||||
```
|
||||
|
||||
## 进一步建议
|
||||
如果需要立即完全移除这些节点,可以考虑以下方法:
|
||||
|
||||
1. **重启Nomad服务器**: 重启Nomad服务器将强制重新评估所有节点状态,通常会清除已失效的节点
|
||||
- 注意:这可能会导致短暂的服务中断
|
||||
|
||||
2. **手动清理Consul中的节点信息**: 如果Nomad使用Consul作为后端存储,可以直接从Consul中删除相关的节点信息
|
||||
- 需要谨慎操作,避免影响其他正常节点
|
||||
|
||||
3. **等待自动清理**: Nomad默认会在72小时后自动清理down状态的节点
|
||||
|
||||
## 结论
|
||||
我们已经采取了所有可能的措施来加速移除这些过期节点。目前它们已被标记为不可调度且已完成排水,不会对集群造成影响。如果需要立即完全移除,建议重启Nomad服务器。
|
||||
|
|
@ -1,42 +0,0 @@
|
|||
# Nomad服务器重启备份计划
|
||||
|
||||
## 概述
|
||||
此文档提供了在重启Nomad服务器以清理过期节点时的备份计划和恢复步骤。
|
||||
|
||||
## 重启前检查清单
|
||||
1. 确认当前集群状态
|
||||
2. 记录当前运行的作业和分配
|
||||
3. 确认所有重要服务都有适当的冗余
|
||||
4. 通知相关团队即将进行的维护
|
||||
|
||||
## 重启步骤
|
||||
1. 选择一个非领导者服务器首先重启
|
||||
2. 等待服务器完全恢复并重新加入集群
|
||||
3. 验证集群健康状态
|
||||
4. 继续重启其他服务器节点
|
||||
5. 最后重启领导者节点
|
||||
|
||||
## 领导者节点重启步骤
|
||||
1. 确保至少有3个服务器节点在线以维持仲裁
|
||||
2. 在领导者节点上执行: `systemctl restart nomad`
|
||||
3. 等待服务重新启动
|
||||
4. 验证节点是否已重新加入集群
|
||||
5. 检查过期节点是否已被清理
|
||||
|
||||
## 回滚计划
|
||||
如果重启后出现任何问题:
|
||||
1. 检查Nomad日志: `journalctl -u nomad -f`
|
||||
2. 验证配置文件是否正确
|
||||
3. 如果必要,从备份恢复配置文件
|
||||
4. 联系团队成员协助解决问题
|
||||
|
||||
## 验证步骤
|
||||
1. 检查集群状态: `nomad node status`
|
||||
2. 验证所有重要作业仍在运行
|
||||
3. 确认新作业可以正常调度
|
||||
4. 检查监控系统是否有异常报警
|
||||
|
||||
## 联系人
|
||||
- 主要联系人: [您的姓名]
|
||||
- 备份联系人: [备份人员姓名]
|
||||
- 紧急联系电话: [电话号码]
|
||||
|
|
@ -1,67 +0,0 @@
|
|||
# 🎯 HashiCorp Stack 运维集思录
|
||||
|
||||
## 📍 关键里程碑记录
|
||||
|
||||
### ✅ 2025-09-30 标志性成功
|
||||
**Nomad完全恢复正常运行**
|
||||
- **成功指标**:
|
||||
- Nomad server集群: 7个节点全部在线 (ch2.global为leader)
|
||||
- Nomad client节点: 6个节点全部ready状态
|
||||
- 服务状态: nomad服务运行正常
|
||||
- **关键操作**: 恢复了Nomad的consul配置 (`address = "master:8500,ash3c:8500,warden:8500"`)
|
||||
|
||||
---
|
||||
|
||||
### ❌ 当前大失败
|
||||
**Vault job无法部署到bj-warden节点**
|
||||
- **失败现象**:
|
||||
```
|
||||
* Constraint "${node.unique.name} = bj-warden": 5 nodes excluded by filter
|
||||
* Constraint "${attr.consul.version} semver >= 1.8.0": 1 nodes excluded by filter
|
||||
```
|
||||
- **根本原因发现**: consul-cluster job约束条件为 `(master|ash3c|hcp)`,**warden节点被排除在外**!
|
||||
- **历史教训**: 之前通过移除service块让vault独立运行,但这导致vault无法与consul集成,项目失去意义
|
||||
- **深层问题**: 不是consul没运行,而是**根本不允许在warden节点运行consul**!
|
||||
|
||||
---
|
||||
|
||||
## 🎯 核心矛盾
|
||||
**Vault必须与Consul集成** ←→ **bj-warden节点没有consul**
|
||||
|
||||
### 🎯 新思路:给Nomad节点打consul标签
|
||||
**用户建议**: 给所有运行consul的nomad节点打上标签标识
|
||||
- **优势**: 优雅、可扩展、符合Nomad范式
|
||||
- **实施路径**:
|
||||
1. 给master、ash3c等已有consul节点打标签 `consul=true`
|
||||
2. 修改vault job约束条件,选择有consul标签的节点
|
||||
3. 可选:给warden节点也打标签,后续部署consul到该节点
|
||||
|
||||
---
|
||||
|
||||
### 🔍 当前发现
|
||||
- 所有节点Attributes为null,说明Nomad客户端配置可能有问题
|
||||
- 用nomad拉起consul不能自动让节点具备consul属性
|
||||
- **重大发现**:nomad node status -verbose 和 -json 输出格式数据不一致!
|
||||
- verbose模式显示Meta中有"consul = true"
|
||||
- JSON格式显示Meta为null
|
||||
- 可能是Nomad的bug或数据同步问题
|
||||
|
||||
### 🎯 下一步行动
|
||||
1. **调查Attributes为null的原因** - 检查Nomad客户端配置
|
||||
2. **考虑用ansible部署consul** - 确保consul作为系统服务运行
|
||||
3. **验证meta数据一致性** - 解决verbose和json格式数据不一致问题
|
||||
4. **重新思考节点标签策略** - 基于实际可用的数据格式制定策略
|
||||
|
||||
---
|
||||
|
||||
## 📋 待办清单
|
||||
- [ ] 检查bj-warden节点的consul配置
|
||||
- [ ] 在bj-warden节点启动consul服务
|
||||
- [ ] 验证vault job成功部署
|
||||
- [ ] 确认vault与consul集成正常
|
||||
|
||||
---
|
||||
|
||||
## 🚫 禁止操作
|
||||
- ❌ 移除vault job的service块 (会导致失去consul集成)
|
||||
- ❌ 忽略consul版本约束 (会导致兼容性问题)
|
||||
|
|
@ -1,72 +0,0 @@
|
|||
# 脚本目录结构说明
|
||||
|
||||
本目录包含项目中所有的脚本文件,按功能分类组织。
|
||||
|
||||
## 目录结构
|
||||
|
||||
```
|
||||
scripts/
|
||||
├── README.md # 本说明文件
|
||||
├── setup/ # 环境设置和初始化脚本
|
||||
│ ├── init/ # 初始化脚本
|
||||
│ ├── config/ # 配置生成脚本
|
||||
│ └── environment/ # 环境设置脚本
|
||||
├── deployment/ # 部署相关脚本
|
||||
│ ├── vault/ # Vault部署脚本
|
||||
│ ├── consul/ # Consul部署脚本
|
||||
│ ├── nomad/ # Nomad部署脚本
|
||||
│ └── infrastructure/ # 基础设施部署脚本
|
||||
├── testing/ # 测试脚本
|
||||
│ ├── unit/ # 单元测试
|
||||
│ ├── integration/ # 集成测试
|
||||
│ ├── mcp/ # MCP服务器测试
|
||||
│ └── infrastructure/ # 基础设施测试
|
||||
├── utilities/ # 工具脚本
|
||||
│ ├── backup/ # 备份相关
|
||||
│ ├── monitoring/ # 监控相关
|
||||
│ ├── maintenance/ # 维护相关
|
||||
│ └── helpers/ # 辅助工具
|
||||
├── mcp/ # MCP服务器相关脚本
|
||||
│ ├── servers/ # MCP服务器实现
|
||||
│ ├── configs/ # MCP配置脚本
|
||||
│ └── tools/ # MCP工具脚本
|
||||
└── ci-cd/ # CI/CD相关脚本
|
||||
├── build/ # 构建脚本
|
||||
├── deploy/ # 部署脚本
|
||||
└── quality/ # 代码质量检查脚本
|
||||
```
|
||||
|
||||
## 脚本命名规范
|
||||
|
||||
- 使用小写字母和连字符分隔
|
||||
- 功能明确的前缀:
|
||||
- `init-` : 初始化脚本
|
||||
- `deploy-` : 部署脚本
|
||||
- `test-` : 测试脚本
|
||||
- `backup-` : 备份脚本
|
||||
- `monitor-` : 监控脚本
|
||||
- `setup-` : 设置脚本
|
||||
|
||||
## 使用说明
|
||||
|
||||
1. 所有脚本都应该有执行权限
|
||||
2. 脚本应该包含适当的错误处理
|
||||
3. 重要操作前应该有确认提示
|
||||
4. 脚本应该支持 `--help` 参数显示使用说明
|
||||
|
||||
## 快速访问
|
||||
|
||||
常用脚本的快速访问方式:
|
||||
|
||||
```bash
|
||||
# 测试相关
|
||||
make test # 运行所有测试
|
||||
./scripts/testing/mcp/test-all-mcp-servers.sh
|
||||
|
||||
# 部署相关
|
||||
./scripts/deployment/vault/deploy-vault-dev.sh
|
||||
./scripts/deployment/consul/deploy-consul-cluster.sh
|
||||
|
||||
# 工具相关
|
||||
./scripts/utilities/backup/backup-all.sh
|
||||
./scripts/utilities/monitoring/health-check.sh
|
||||
|
|
@ -1,113 +0,0 @@
|
|||
# 脚本索引
|
||||
|
||||
本文件列出了所有已整理的脚本及其功能说明。
|
||||
|
||||
## 设置和初始化脚本 (setup/)
|
||||
|
||||
### 初始化脚本 (setup/init/)
|
||||
- `init-vault-dev.sh` - 初始化开发环境的 Vault
|
||||
- `init-vault-dev-api.sh` - 通过 API 初始化开发环境的 Vault
|
||||
- `init-vault-cluster.sh` - 初始化 Vault 集群
|
||||
|
||||
### 配置生成脚本 (setup/config/)
|
||||
- `setup-consul-cluster-variables.sh` - 设置 Consul 集群变量
|
||||
- `setup-consul-variables-and-storage.sh` - 设置 Consul 变量和存储
|
||||
- `generate-consul-config.sh` - 生成 Consul 配置文件
|
||||
|
||||
## 部署脚本 (deployment/)
|
||||
|
||||
### Vault 部署 (deployment/vault/)
|
||||
- `deploy-vault.sh` - 部署 Vault
|
||||
- `vault-dev-example.sh` - Vault 开发环境示例
|
||||
- `vault-dev-quickstart.sh` - Vault 开发环境快速启动
|
||||
|
||||
### Consul 部署 (deployment/consul/)
|
||||
- `deploy-consul-cluster-kv.sh` - 部署 Consul 集群(使用 KV 存储)
|
||||
- `consul-variables-example.sh` - Consul 变量示例
|
||||
|
||||
## 测试脚本 (testing/)
|
||||
|
||||
### 主测试运行器 (testing/)
|
||||
- `test-runner.sh` - 主测试运行器
|
||||
|
||||
### 集成测试 (testing/integration/)
|
||||
- `verify-vault-consul-integration.sh` - 验证 Vault-Consul 集成
|
||||
|
||||
### 基础设施测试 (testing/infrastructure/)
|
||||
- `test-nomad-config.sh` - 测试 Nomad 配置
|
||||
- `test-traefik-deployment.sh` - 测试 Traefik 部署
|
||||
|
||||
### MCP 测试 (testing/mcp/)
|
||||
- `test_direct_search.sh` - 直接搜索测试
|
||||
- `test_local_mcp_servers.sh` - 本地 MCP 服务器测试
|
||||
- `test_mcp_interface.sh` - MCP 接口测试
|
||||
- `test_mcp_search_final.sh` - MCP 搜索最终测试
|
||||
- `test_mcp_servers.sh` - MCP 服务器测试
|
||||
- `test_qdrant_ollama_tools.sh` - Qdrant Ollama 工具测试
|
||||
- `test_qdrant_ollama_tools_fixed.sh` - Qdrant Ollama 工具修复测试
|
||||
- `test_search_documents.sh` - 搜索文档测试
|
||||
- `test_mcp_servers_comprehensive.py` - MCP 服务器综合测试(Python)
|
||||
- `test_mcp_servers_improved.py` - MCP 服务器改进测试(Python)
|
||||
- `test_mcp_servers_simple.py` - MCP 服务器简单测试(Python)
|
||||
- `test_qdrant_ollama_server.py` - Qdrant Ollama 服务器测试(Python)
|
||||
|
||||
## 工具脚本 (utilities/)
|
||||
|
||||
### 备份工具 (utilities/backup/)
|
||||
- `backup-consul.sh` - 备份 Consul 数据
|
||||
|
||||
### 维护工具 (utilities/maintenance/)
|
||||
- `cleanup-global-config.sh` - 清理全局配置
|
||||
|
||||
### 辅助工具 (utilities/helpers/)
|
||||
- `show-vault-dev-keys.sh` - 显示 Vault 开发环境密钥
|
||||
- `nomad-leader-discovery.sh` - Nomad 领导者发现
|
||||
- `manage-vault-consul.sh` - 管理 Vault-Consul
|
||||
- `fix-alpine-cgroups.sh` - 修复 Alpine cgroups
|
||||
- `fix-alpine-cgroups-systemd.sh` - 修复 Alpine cgroups(systemd)
|
||||
|
||||
## MCP 相关脚本 (mcp/)
|
||||
|
||||
### MCP 服务器 (mcp/servers/)
|
||||
- `qdrant-mcp-server.py` - Qdrant MCP 服务器
|
||||
- `qdrant-ollama-integration.py` - Qdrant Ollama 集成
|
||||
- `qdrant-ollama-mcp-server.py` - Qdrant Ollama MCP 服务器
|
||||
|
||||
### MCP 配置 (mcp/configs/)
|
||||
- `sync-all-configs.sh` - 同步所有 MCP 配置
|
||||
|
||||
### MCP 工具 (mcp/tools/)
|
||||
- `start-mcp-server.sh` - 启动 MCP 服务器
|
||||
|
||||
## 使用说明
|
||||
|
||||
### 快速启动命令
|
||||
|
||||
```bash
|
||||
# 运行所有测试
|
||||
./scripts/testing/test-runner.sh
|
||||
|
||||
# 初始化开发环境
|
||||
./scripts/setup/init/init-vault-dev.sh
|
||||
|
||||
# 部署 Consul 集群
|
||||
./scripts/deployment/consul/deploy-consul-cluster-kv.sh
|
||||
|
||||
# 启动 MCP 服务器
|
||||
./scripts/mcp/tools/start-mcp-server.sh
|
||||
|
||||
# 备份 Consul
|
||||
./scripts/utilities/backup/backup-consul.sh
|
||||
```
|
||||
|
||||
### 权限设置
|
||||
|
||||
确保所有脚本都有执行权限:
|
||||
|
||||
```bash
|
||||
find scripts/ -name "*.sh" -exec chmod +x {} \;
|
||||
```
|
||||
|
||||
### 环境变量
|
||||
|
||||
某些脚本可能需要特定的环境变量,请参考各脚本的注释说明。
|
||||
|
|
@ -1,178 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# 文档生成脚本
|
||||
# 自动生成项目文档
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# 颜色定义
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# 日志函数
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
# 生成脚本文档
|
||||
generate_script_docs() {
|
||||
log_info "生成脚本文档..."
|
||||
|
||||
local doc_file="docs/SCRIPTS.md"
|
||||
mkdir -p "$(dirname "$doc_file")"
|
||||
|
||||
cat > "$doc_file" << 'EOF'
|
||||
# 脚本文档
|
||||
|
||||
本文档自动生成,包含项目中所有脚本的说明。
|
||||
|
||||
## 脚本列表
|
||||
|
||||
EOF
|
||||
|
||||
# 遍历脚本目录
|
||||
find scripts/ -name "*.sh" -type f | sort | while read -r script; do
|
||||
echo "### $script" >> "$doc_file"
|
||||
echo "" >> "$doc_file"
|
||||
|
||||
# 提取脚本描述(从注释中)
|
||||
local description
|
||||
description=$(head -n 10 "$script" | grep "^#" | grep -v "^#!/" | head -n 3 | sed 's/^# *//' || echo "无描述")
|
||||
|
||||
echo "**描述**: $description" >> "$doc_file"
|
||||
echo "" >> "$doc_file"
|
||||
|
||||
# 检查是否有使用说明
|
||||
if grep -q "Usage:" "$script" || grep -q "用法:" "$script"; then
|
||||
echo "**用法**: 请查看脚本内部说明" >> "$doc_file"
|
||||
fi
|
||||
|
||||
echo "" >> "$doc_file"
|
||||
done
|
||||
|
||||
log_success "脚本文档已生成: $doc_file"
|
||||
}
|
||||
|
||||
# 生成 API 文档
|
||||
generate_api_docs() {
|
||||
log_info "生成 API 文档..."
|
||||
|
||||
local doc_file="docs/API.md"
|
||||
|
||||
cat > "$doc_file" << 'EOF'
|
||||
# API 文档
|
||||
|
||||
## MCP 服务器 API
|
||||
|
||||
### Qdrant MCP 服务器
|
||||
|
||||
- **端口**: 3000
|
||||
- **协议**: HTTP/JSON-RPC
|
||||
- **功能**: 向量搜索和文档管理
|
||||
|
||||
### 主要端点
|
||||
|
||||
- `/search` - 搜索文档
|
||||
- `/add` - 添加文档
|
||||
- `/delete` - 删除文档
|
||||
|
||||
更多详细信息请参考各 MCP 服务器的源码。
|
||||
EOF
|
||||
|
||||
log_success "API 文档已生成: $doc_file"
|
||||
}
|
||||
|
||||
# 生成部署文档
|
||||
generate_deployment_docs() {
|
||||
log_info "生成部署文档..."
|
||||
|
||||
local doc_file="docs/DEPLOYMENT.md"
|
||||
|
||||
cat > "$doc_file" << 'EOF'
|
||||
# 部署文档
|
||||
|
||||
## 快速开始
|
||||
|
||||
1. 环境设置
|
||||
```bash
|
||||
make setup
|
||||
```
|
||||
|
||||
2. 初始化服务
|
||||
```bash
|
||||
./scripts/setup/init/init-vault-dev.sh
|
||||
./scripts/deployment/consul/deploy-consul-cluster-kv.sh
|
||||
```
|
||||
|
||||
3. 启动 MCP 服务器
|
||||
```bash
|
||||
./scripts/mcp/tools/start-mcp-server.sh
|
||||
```
|
||||
|
||||
## 详细部署步骤
|
||||
|
||||
请参考各组件的具体部署脚本和配置文件。
|
||||
EOF
|
||||
|
||||
log_success "部署文档已生成: $doc_file"
|
||||
}
|
||||
|
||||
# 更新主 README
|
||||
update_main_readme() {
|
||||
log_info "更新主 README..."
|
||||
|
||||
# 备份原 README
|
||||
if [ -f "README.md" ]; then
|
||||
cp "README.md" "README.md.backup"
|
||||
fi
|
||||
|
||||
# 在 README 中添加脚本整理信息
|
||||
cat >> "README.md" << 'EOF'
|
||||
|
||||
## 脚本整理
|
||||
|
||||
项目脚本已重新整理,按功能分类存放在 `scripts/` 目录中:
|
||||
|
||||
- `scripts/setup/` - 环境设置和初始化
|
||||
- `scripts/deployment/` - 部署相关脚本
|
||||
- `scripts/testing/` - 测试脚本
|
||||
- `scripts/utilities/` - 工具脚本
|
||||
- `scripts/mcp/` - MCP 服务器相关
|
||||
- `scripts/ci-cd/` - CI/CD 相关
|
||||
|
||||
详细信息请查看 [脚本索引](scripts/SCRIPT_INDEX.md)。
|
||||
|
||||
EOF
|
||||
|
||||
log_success "主 README 已更新"
|
||||
}
|
||||
|
||||
# 主函数
|
||||
main() {
|
||||
log_info "开始生成文档..."
|
||||
|
||||
generate_script_docs
|
||||
generate_api_docs
|
||||
generate_deployment_docs
|
||||
update_main_readme
|
||||
|
||||
log_success "文档生成完成!"
|
||||
}
|
||||
|
||||
# 执行主函数
|
||||
main "$@"
|
||||
|
|
@ -1,231 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# 代码质量检查脚本
|
||||
# 检查脚本语法、代码风格等
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# 颜色定义
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# 计数器
|
||||
TOTAL_FILES=0
|
||||
PASSED_FILES=0
|
||||
FAILED_FILES=0
|
||||
|
||||
# 日志函数
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
# 检查 Shell 脚本语法
|
||||
check_shell_syntax() {
|
||||
log_info "检查 Shell 脚本语法..."
|
||||
|
||||
local shell_files
|
||||
shell_files=$(find scripts/ -name "*.sh" -type f)
|
||||
|
||||
if [ -z "$shell_files" ]; then
|
||||
log_warning "未找到 Shell 脚本文件"
|
||||
return 0
|
||||
fi
|
||||
|
||||
while IFS= read -r file; do
|
||||
((TOTAL_FILES++))
|
||||
log_info "检查: $file"
|
||||
|
||||
if bash -n "$file"; then
|
||||
log_success "✓ $file"
|
||||
((PASSED_FILES++))
|
||||
else
|
||||
log_error "✗ $file - 语法错误"
|
||||
((FAILED_FILES++))
|
||||
fi
|
||||
done <<< "$shell_files"
|
||||
}
|
||||
|
||||
# 检查 Python 脚本语法
|
||||
check_python_syntax() {
|
||||
log_info "检查 Python 脚本语法..."
|
||||
|
||||
local python_files
|
||||
python_files=$(find scripts/ -name "*.py" -type f)
|
||||
|
||||
if [ -z "$python_files" ]; then
|
||||
log_warning "未找到 Python 脚本文件"
|
||||
return 0
|
||||
fi
|
||||
|
||||
while IFS= read -r file; do
|
||||
((TOTAL_FILES++))
|
||||
log_info "检查: $file"
|
||||
|
||||
if python3 -m py_compile "$file" 2>/dev/null; then
|
||||
log_success "✓ $file"
|
||||
((PASSED_FILES++))
|
||||
else
|
||||
log_error "✗ $file - 语法错误"
|
||||
((FAILED_FILES++))
|
||||
fi
|
||||
done <<< "$python_files"
|
||||
}
|
||||
|
||||
# 检查脚本权限
|
||||
check_script_permissions() {
|
||||
log_info "检查脚本执行权限..."
|
||||
|
||||
local script_files
|
||||
script_files=$(find scripts/ -name "*.sh" -type f)
|
||||
|
||||
if [ -z "$script_files" ]; then
|
||||
log_warning "未找到脚本文件"
|
||||
return 0
|
||||
fi
|
||||
|
||||
local permission_issues=0
|
||||
|
||||
while IFS= read -r file; do
|
||||
if [ ! -x "$file" ]; then
|
||||
log_warning "⚠ $file - 缺少执行权限"
|
||||
((permission_issues++))
|
||||
fi
|
||||
done <<< "$script_files"
|
||||
|
||||
if [ "$permission_issues" -eq 0 ]; then
|
||||
log_success "所有脚本都有执行权限"
|
||||
else
|
||||
log_warning "发现 $permission_issues 个权限问题"
|
||||
log_info "运行以下命令修复权限: find scripts/ -name '*.sh' -exec chmod +x {} \\;"
|
||||
fi
|
||||
}
|
||||
|
||||
# 检查脚本头部
|
||||
check_script_headers() {
|
||||
log_info "检查脚本头部..."
|
||||
|
||||
local script_files
|
||||
script_files=$(find scripts/ -name "*.sh" -type f)
|
||||
|
||||
if [ -z "$script_files" ]; then
|
||||
log_warning "未找到脚本文件"
|
||||
return 0
|
||||
fi
|
||||
|
||||
local header_issues=0
|
||||
|
||||
while IFS= read -r file; do
|
||||
local first_line
|
||||
first_line=$(head -n 1 "$file")
|
||||
|
||||
if [[ ! "$first_line" =~ ^#!/bin/bash ]] && [[ ! "$first_line" =~ ^#!/usr/bin/env\ bash ]]; then
|
||||
log_warning "⚠ $file - 缺少或错误的 shebang"
|
||||
((header_issues++))
|
||||
fi
|
||||
done <<< "$script_files"
|
||||
|
||||
if [ "$header_issues" -eq 0 ]; then
|
||||
log_success "所有脚本都有正确的 shebang"
|
||||
else
|
||||
log_warning "发现 $header_issues 个 shebang 问题"
|
||||
fi
|
||||
}
|
||||
|
||||
# 检查配置文件语法
|
||||
check_config_syntax() {
|
||||
log_info "检查配置文件语法..."
|
||||
|
||||
# 检查 JSON 文件
|
||||
local json_files
|
||||
json_files=$(find . -name "*.json" -type f -not -path "./.git/*")
|
||||
|
||||
if [ -n "$json_files" ]; then
|
||||
while IFS= read -r file; do
|
||||
((TOTAL_FILES++))
|
||||
log_info "检查 JSON: $file"
|
||||
|
||||
if jq empty "$file" 2>/dev/null; then
|
||||
log_success "✓ $file"
|
||||
((PASSED_FILES++))
|
||||
else
|
||||
log_error "✗ $file - JSON 语法错误"
|
||||
((FAILED_FILES++))
|
||||
fi
|
||||
done <<< "$json_files"
|
||||
fi
|
||||
|
||||
# 检查 YAML 文件
|
||||
local yaml_files
|
||||
yaml_files=$(find . -name "*.yml" -o -name "*.yaml" -type f -not -path "./.git/*")
|
||||
|
||||
if [ -n "$yaml_files" ] && command -v yamllint &> /dev/null; then
|
||||
while IFS= read -r file; do
|
||||
((TOTAL_FILES++))
|
||||
log_info "检查 YAML: $file"
|
||||
|
||||
if yamllint "$file" 2>/dev/null; then
|
||||
log_success "✓ $file"
|
||||
((PASSED_FILES++))
|
||||
else
|
||||
log_error "✗ $file - YAML 语法错误"
|
||||
((FAILED_FILES++))
|
||||
fi
|
||||
done <<< "$yaml_files"
|
||||
elif [ -n "$yaml_files" ]; then
|
||||
log_warning "yamllint 未安装,跳过 YAML 检查"
|
||||
fi
|
||||
}
|
||||
|
||||
# 生成报告
|
||||
generate_report() {
|
||||
log_info "生成检查报告..."
|
||||
|
||||
echo
|
||||
echo "=================================="
|
||||
echo " 代码质量检查报告"
|
||||
echo "=================================="
|
||||
echo "总文件数: $TOTAL_FILES"
|
||||
echo "通过: $PASSED_FILES"
|
||||
echo "失败: $FAILED_FILES"
|
||||
echo "成功率: $(( PASSED_FILES * 100 / (TOTAL_FILES == 0 ? 1 : TOTAL_FILES) ))%"
|
||||
echo "=================================="
|
||||
|
||||
if [ "$FAILED_FILES" -eq 0 ]; then
|
||||
log_success "所有检查都通过了!"
|
||||
return 0
|
||||
else
|
||||
log_error "发现 $FAILED_FILES 个问题,请修复后重新运行"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# 主函数
|
||||
main() {
|
||||
log_info "开始代码质量检查..."
|
||||
|
||||
check_shell_syntax
|
||||
check_python_syntax
|
||||
check_script_permissions
|
||||
check_script_headers
|
||||
check_config_syntax
|
||||
|
||||
generate_report
|
||||
}
|
||||
|
||||
# 执行主函数
|
||||
main "$@"
|
||||
|
|
@ -1,142 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# 安全扫描脚本
|
||||
# 扫描代码中的安全问题和敏感信息
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# 颜色定义
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# 计数器
|
||||
TOTAL_ISSUES=0
|
||||
HIGH_ISSUES=0
|
||||
MEDIUM_ISSUES=0
|
||||
LOW_ISSUES=0
|
||||
|
||||
# 日志函数
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
# 检查敏感信息泄露
|
||||
check_secrets() {
|
||||
log_info "检查敏感信息泄露..."
|
||||
|
||||
local patterns=(
|
||||
"password\s*=\s*['\"][^'\"]*['\"]"
|
||||
"token\s*=\s*['\"][^'\"]*['\"]"
|
||||
"api_key\s*=\s*['\"][^'\"]*['\"]"
|
||||
"secret\s*=\s*['\"][^'\"]*['\"]"
|
||||
"private_key"
|
||||
"-----BEGIN.*PRIVATE KEY-----"
|
||||
)
|
||||
|
||||
local found_secrets=0
|
||||
|
||||
for pattern in "${patterns[@]}"; do
|
||||
local matches
|
||||
matches=$(grep -r -i -E "$pattern" . --exclude-dir=.git --exclude-dir=backups 2>/dev/null || true)
|
||||
|
||||
if [ -n "$matches" ]; then
|
||||
log_error "发现可能的敏感信息:"
|
||||
echo "$matches"
|
||||
((found_secrets++))
|
||||
((HIGH_ISSUES++))
|
||||
fi
|
||||
done
|
||||
|
||||
if [ "$found_secrets" -eq 0 ]; then
|
||||
log_success "未发现明显的敏感信息泄露"
|
||||
else
|
||||
log_error "发现 $found_secrets 种类型的敏感信息,请检查并移除"
|
||||
fi
|
||||
|
||||
((TOTAL_ISSUES += found_secrets))
|
||||
}
|
||||
|
||||
# 检查不安全的命令使用
|
||||
check_unsafe_commands() {
|
||||
log_info "检查不安全的命令使用..."
|
||||
|
||||
local unsafe_patterns=(
|
||||
"rm\s+-rf\s+/"
|
||||
"chmod\s+777"
|
||||
"curl.*-k"
|
||||
"wget.*--no-check-certificate"
|
||||
)
|
||||
|
||||
local unsafe_found=0
|
||||
|
||||
for pattern in "${unsafe_patterns[@]}"; do
|
||||
local matches
|
||||
matches=$(grep -r -E "$pattern" scripts/ 2>/dev/null || true)
|
||||
|
||||
if [ -n "$matches" ]; then
|
||||
log_warning "发现可能不安全的命令使用:"
|
||||
echo "$matches"
|
||||
((unsafe_found++))
|
||||
((MEDIUM_ISSUES++))
|
||||
fi
|
||||
done
|
||||
|
||||
if [ "$unsafe_found" -eq 0 ]; then
|
||||
log_success "未发现明显不安全的命令使用"
|
||||
else
|
||||
log_warning "发现 $unsafe_found 个可能不安全的命令,请检查"
|
||||
fi
|
||||
|
||||
((TOTAL_ISSUES += unsafe_found))
|
||||
}
|
||||
|
||||
# 生成报告
|
||||
generate_report() {
|
||||
log_info "生成安全扫描报告..."
|
||||
|
||||
echo
|
||||
echo "=================================="
|
||||
echo " 安全扫描报告"
|
||||
echo "=================================="
|
||||
echo "总问题数: $TOTAL_ISSUES"
|
||||
echo "高危: $HIGH_ISSUES"
|
||||
echo "中危: $MEDIUM_ISSUES"
|
||||
echo "低危: $LOW_ISSUES"
|
||||
echo "=================================="
|
||||
|
||||
if [ "$TOTAL_ISSUES" -eq 0 ]; then
|
||||
log_success "安全扫描通过,未发现问题!"
|
||||
return 0
|
||||
else
|
||||
log_warning "发现 $TOTAL_ISSUES 个安全问题,请检查并修复"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# 主函数
|
||||
main() {
|
||||
log_info "开始安全扫描..."
|
||||
|
||||
check_secrets
|
||||
check_unsafe_commands
|
||||
|
||||
generate_report
|
||||
}
|
||||
|
||||
# 执行主函数
|
||||
main "$@"
|
||||
|
|
@ -0,0 +1,58 @@
|
|||
#!/bin/bash
|
||||
|
||||
# 为所有 Nomad Server 部署 Consul Client
|
||||
|
||||
echo "🚀 部署 Consul Client 到所有 Nomad Server 节点"
|
||||
echo "================================================"
|
||||
|
||||
# 部署 Consul Client
|
||||
echo "1. 部署 Consul Client..."
|
||||
ansible-playbook -i ansible/inventory/hosts.yml \
|
||||
ansible/consul-client-deployment.yml \
|
||||
--limit nomad_servers
|
||||
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "✅ Consul Client 部署成功"
|
||||
else
|
||||
echo "❌ Consul Client 部署失败"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 更新 Nomad 配置
|
||||
echo ""
|
||||
echo "2. 更新 Nomad Server 配置..."
|
||||
echo "需要手动更新每个 Nomad Server 的配置:"
|
||||
echo ""
|
||||
echo "修改 /etc/nomad.d/nomad.hcl 中的 consul 块:"
|
||||
echo "consul {"
|
||||
echo " address = \"127.0.0.1:8500\" # 改为本地"
|
||||
echo " server_service_name = \"nomad\""
|
||||
echo " client_service_name = \"nomad-client\""
|
||||
echo " auto_advertise = true"
|
||||
echo " server_auto_join = true"
|
||||
echo " client_auto_join = false"
|
||||
echo "}"
|
||||
echo ""
|
||||
echo "然后重启 Nomad 服务:"
|
||||
echo "systemctl restart nomad"
|
||||
|
||||
echo ""
|
||||
echo "3. 验证部署..."
|
||||
sleep 5
|
||||
|
||||
# 验证 Consul Client
|
||||
for server in semaphore ch3 ash1d ash2e ch2 de onecloud1; do
|
||||
echo "检查 $server..."
|
||||
if curl -s http://$server.tailnet-68f9.ts.net:8500/v1/status/leader > /dev/null 2>&1; then
|
||||
echo "✅ $server - Consul Client 运行正常"
|
||||
else
|
||||
echo "❌ $server - Consul Client 无响应"
|
||||
fi
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "🎉 部署完成!"
|
||||
echo "下一步:"
|
||||
echo "1. 手动更新每个 Nomad Server 的配置文件"
|
||||
echo "2. 重启 Nomad 服务"
|
||||
echo "3. 验证 Nomad 与 Consul 的集成"
|
||||
|
|
@ -1,217 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Consul 变量和存储配置示例脚本
|
||||
# 此脚本展示了如何配置Consul的变量和存储功能
|
||||
|
||||
set -e
|
||||
|
||||
# 配置参数
|
||||
CONSUL_ADDR=${CONSUL_ADDR:-"http://localhost:8500"}
|
||||
ENVIRONMENT=${ENVIRONMENT:-"dev"}
|
||||
PROVIDER=${PROVIDER:-"oracle"}
|
||||
REGION=${REGION:-"kr"}
|
||||
|
||||
echo "Consul 变量和存储配置示例"
|
||||
echo "========================="
|
||||
echo "Consul 地址: $CONSUL_ADDR"
|
||||
echo "环境: $ENVIRONMENT"
|
||||
echo "提供商: $PROVIDER"
|
||||
echo "区域: $REGION"
|
||||
echo ""
|
||||
|
||||
# 检查Consul连接
|
||||
check_consul_connection() {
|
||||
echo "检查Consul连接..."
|
||||
if curl -s "$CONSUL_ADDR/v1/status/leader" > /dev/null; then
|
||||
echo "✓ Consul连接正常"
|
||||
else
|
||||
echo "✗ 无法连接到Consul,请检查Consul服务是否运行"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
# 配置应用变量
|
||||
configure_app_variables() {
|
||||
echo "配置应用变量..."
|
||||
|
||||
# 应用基本信息
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/name" -d "my-application"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/version" -d "1.0.0"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/environment" -d "$ENVIRONMENT"
|
||||
|
||||
# 特性开关
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/features/new_ui" -d "true"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/features/advanced_analytics" -d "false"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/features/beta_features" -d "true"
|
||||
|
||||
echo "✓ 应用变量配置完成"
|
||||
}
|
||||
|
||||
# 配置数据库变量
|
||||
configure_database_variables() {
|
||||
echo "配置数据库变量..."
|
||||
|
||||
# 数据库连接信息
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/host" -d "db.example.com"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/port" -d "5432"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/name" -d "myapp_db"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/ssl_mode" -d "require"
|
||||
|
||||
# 数据库连接池配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/max_connections" -d "100"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/min_connections" -d "10"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/connection_timeout" -d "30s"
|
||||
|
||||
echo "✓ 数据库变量配置完成"
|
||||
}
|
||||
|
||||
# 配置缓存变量
|
||||
configure_cache_variables() {
|
||||
echo "配置缓存变量..."
|
||||
|
||||
# Redis配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/host" -d "redis.example.com"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/port" -d "6379"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/password" -d "secure_password"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/db" -d "0"
|
||||
|
||||
# 缓存策略
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/ttl" -d "3600"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/max_memory" -d "2gb"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/eviction_policy" -d "allkeys-lru"
|
||||
|
||||
echo "✓ 缓存变量配置完成"
|
||||
}
|
||||
|
||||
# 配置消息队列变量
|
||||
configure_messaging_variables() {
|
||||
echo "配置消息队列变量..."
|
||||
|
||||
# RabbitMQ配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/host" -d "rabbitmq.example.com"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/port" -d "5672"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/username" -d "myapp"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/password" -d "secure_password"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/vhost" -d "/myapp"
|
||||
|
||||
# 队列配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/queue_name" -d "tasks"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/exchange" -d "myapp_exchange"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/routing_key" -d "task.#"
|
||||
|
||||
echo "✓ 消息队列变量配置完成"
|
||||
}
|
||||
|
||||
# 配置云服务提供商变量
|
||||
configure_provider_variables() {
|
||||
echo "配置云服务提供商变量..."
|
||||
|
||||
if [ "$PROVIDER" = "oracle" ]; then
|
||||
# Oracle Cloud配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/tenancy_ocid" -d "ocid1.tenancy.oc1..aaaaaaaayourtenancyocid"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/user_ocid" -d "ocid1.user.oc1..aaaaaaaayouruserocid"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/fingerprint" -d "your-fingerprint"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/compartment_id" -d "ocid1.compartment.oc1..aaaaaaaayourcompartmentid"
|
||||
elif [ "$PROVIDER" = "aws" ]; then
|
||||
# AWS配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/access_key" -d "your-access-key"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/secret_key" -d "your-secret-key"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION"
|
||||
elif [ "$PROVIDER" = "gcp" ]; then
|
||||
# GCP配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/project_id" -d "your-project-id"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/credentials_path" -d "/path/to/service-account.json"
|
||||
elif [ "$PROVIDER" = "digitalocean" ]; then
|
||||
# DigitalOcean配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/token" -d "your-do-token"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION"
|
||||
fi
|
||||
|
||||
echo "✓ 云服务提供商变量配置完成"
|
||||
}
|
||||
|
||||
# 配置存储相关变量
|
||||
configure_storage_variables() {
|
||||
echo "配置存储相关变量..."
|
||||
|
||||
# 快照配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/enabled" -d "true"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/interval" -d "24h"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/retain" -d "30"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/name" -d "consul-snapshot-{{.Timestamp}}"
|
||||
|
||||
# 备份配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/enabled" -d "true"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/interval" -d "6h"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/retain" -d "7"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/name" -d "consul-backup-{{.Timestamp}}"
|
||||
|
||||
# 数据目录配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/data_dir" -d "/opt/consul/data"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/raft_dir" -d "/opt/consul/raft"
|
||||
|
||||
# Autopilot配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/cleanup_dead_servers" -d "true"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/last_contact_threshold" -d "200ms"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/max_trailing_logs" -d "250"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/server_stabilization_time" -d "10s"
|
||||
|
||||
echo "✓ 存储相关变量配置完成"
|
||||
}
|
||||
|
||||
# 显示配置结果
|
||||
display_configuration() {
|
||||
echo ""
|
||||
echo "配置结果:"
|
||||
echo "========="
|
||||
|
||||
echo "应用配置:"
|
||||
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
|
||||
|
||||
echo ""
|
||||
echo "数据库配置:"
|
||||
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
|
||||
|
||||
echo ""
|
||||
echo "缓存配置:"
|
||||
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
|
||||
|
||||
echo ""
|
||||
echo "消息队列配置:"
|
||||
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
|
||||
|
||||
echo ""
|
||||
echo "云服务提供商配置:"
|
||||
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
|
||||
|
||||
echo ""
|
||||
echo "存储配置:"
|
||||
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)"
|
||||
}
|
||||
|
||||
# 主函数
|
||||
main() {
|
||||
check_consul_connection
|
||||
configure_app_variables
|
||||
configure_database_variables
|
||||
configure_cache_variables
|
||||
configure_messaging_variables
|
||||
configure_provider_variables
|
||||
configure_storage_variables
|
||||
display_configuration
|
||||
|
||||
echo ""
|
||||
echo "✓ 所有变量和存储配置已完成!"
|
||||
echo ""
|
||||
echo "使用说明:"
|
||||
echo "1. 在Terraform中使用consul_keys数据源获取这些配置"
|
||||
echo "2. 在应用程序中使用Consul客户端库读取这些配置"
|
||||
echo "3. 使用Consul UI查看和管理这些配置"
|
||||
echo ""
|
||||
echo "配置文件位置: /root/mgmt/docs/setup/consul_variables_and_storage_guide.md"
|
||||
}
|
||||
|
||||
# 执行主函数
|
||||
main "$@"
|
||||
|
|
@ -1,117 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Consul集群部署脚本 - 遵循最佳变量命名规范
|
||||
# 此脚本将部署一个完全遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式的Consul集群
|
||||
|
||||
set -e
|
||||
|
||||
# 配置参数
|
||||
CONSUL_ADDR="${CONSUL_ADDR:-localhost:8500}"
|
||||
ENVIRONMENT="${ENVIRONMENT:-dev}"
|
||||
NOMAD_ADDR="${NOMAD_ADDR:-localhost:4646}"
|
||||
CONSUL_CONFIG_DIR="${CONSUL_CONFIG_DIR:-/root/mgmt/components/consul/configs}"
|
||||
CONSUL_JOBS_DIR="${CONSUL_JOBS_DIR:-/root/mgmt/components/consul/jobs}"
|
||||
|
||||
echo "开始部署遵循最佳变量命名规范的Consul集群..."
|
||||
echo "Consul地址: $CONSUL_ADDR"
|
||||
echo "Nomad地址: $NOMAD_ADDR"
|
||||
echo "环境: $ENVIRONMENT"
|
||||
|
||||
# 检查Consul连接
|
||||
echo "检查Consul连接..."
|
||||
if ! curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then
|
||||
echo "错误: 无法连接到Consul服务器 $CONSUL_ADDR"
|
||||
exit 1
|
||||
fi
|
||||
echo "Consul连接成功"
|
||||
|
||||
# 检查Nomad连接
|
||||
echo "检查Nomad连接..."
|
||||
if ! curl -s "$NOMAD_ADDR/v1/status/leader" | grep -q "."; then
|
||||
echo "错误: 无法连接到Nomad服务器 $NOMAD_ADDR"
|
||||
exit 1
|
||||
fi
|
||||
echo "Nomad连接成功"
|
||||
|
||||
# 步骤1: 设置Consul变量
|
||||
echo "步骤1: 设置Consul变量..."
|
||||
/root/mgmt/deployment/scripts/setup_consul_cluster_variables.sh
|
||||
|
||||
# 步骤2: 生成Consul配置文件
|
||||
echo "步骤2: 生成Consul配置文件..."
|
||||
/root/mgmt/deployment/scripts/generate_consul_config.sh
|
||||
|
||||
# 步骤3: 停止现有的Consul集群
|
||||
echo "步骤3: 停止现有的Consul集群..."
|
||||
if nomad job status consul-cluster-simple 2>/dev/null; then
|
||||
nomad job stop consul-cluster-simple
|
||||
echo "已停止现有的consul-cluster-simple作业"
|
||||
fi
|
||||
|
||||
if nomad job status consul-cluster-dynamic 2>/dev/null; then
|
||||
nomad job stop consul-cluster-dynamic
|
||||
echo "已停止现有的consul-cluster-dynamic作业"
|
||||
fi
|
||||
|
||||
if nomad job status consul-cluster-kv 2>/dev/null; then
|
||||
nomad job stop consul-cluster-kv
|
||||
echo "已停止现有的consul-cluster-kv作业"
|
||||
fi
|
||||
|
||||
# 步骤4: 部署新的Consul集群
|
||||
echo "步骤4: 部署新的Consul集群..."
|
||||
nomad job run $CONSUL_JOBS_DIR/consul-cluster-kv.nomad
|
||||
|
||||
# 步骤5: 验证部署
|
||||
echo "步骤5: 验证部署..."
|
||||
sleep 10
|
||||
|
||||
# 检查作业状态
|
||||
if nomad job status consul-cluster-kv | grep -q "running"; then
|
||||
echo "Consul集群作业正在运行"
|
||||
else
|
||||
echo "错误: Consul集群作业未运行"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 检查Consul集群状态
|
||||
if curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then
|
||||
echo "Consul集群leader已选举"
|
||||
else
|
||||
echo "错误: Consul集群leader未选举"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 检查节点数量
|
||||
NODE_COUNT=$(curl -s "$CONSUL_ADDR/v1/status/peers" | jq '. | length')
|
||||
if [ "$NODE_COUNT" -eq 3 ]; then
|
||||
echo "Consul集群节点数量正确: $NODE_COUNT"
|
||||
else
|
||||
echo "警告: Consul集群节点数量不正确: $NODE_COUNT (期望: 3)"
|
||||
fi
|
||||
|
||||
# 步骤6: 验证变量配置
|
||||
echo "步骤6: 验证变量配置..."
|
||||
|
||||
# 检查一些关键变量
|
||||
if curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/datacenter" | jq -r '.[].Value' | base64 -d | grep -q "dc1"; then
|
||||
echo "Consul数据中心配置正确"
|
||||
else
|
||||
echo "警告: Consul数据中心配置可能不正确"
|
||||
fi
|
||||
|
||||
if curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/master/ip" | jq -r '.[].Value' | base64 -d | grep -q "100.117.106.136"; then
|
||||
echo "Consul master节点IP配置正确"
|
||||
else
|
||||
echo "警告: Consul master节点IP配置可能不正确"
|
||||
fi
|
||||
|
||||
# 步骤7: 显示访问信息
|
||||
echo "步骤7: 显示访问信息..."
|
||||
echo "Consul UI地址: http://100.117.106.136:8500"
|
||||
echo "Consul API地址: http://100.117.106.136:8500/v1"
|
||||
echo "Nomad UI地址: http://100.117.106.136:4646"
|
||||
echo "Nomad API地址: http://100.117.106.136:4646/v1"
|
||||
|
||||
echo "Consul集群部署完成!"
|
||||
echo "集群现在完全遵循最佳变量命名规范: config/{environment}/{provider}/{region_or_service}/{key}"
|
||||
|
|
@ -1,143 +0,0 @@
|
|||
#!/bin/bash
|
||||
# 部署Vault集群的脚本
|
||||
|
||||
# 检查并安装Vault
|
||||
if ! which vault >/dev/null; then
|
||||
echo "==== 安装Vault ===="
|
||||
VAULT_VERSION="1.20.4"
|
||||
wget -q https://releases.hashicorp.com/vault/${VAULT_VERSION}/vault_${VAULT_VERSION}_linux_amd64.zip
|
||||
unzip -q vault_${VAULT_VERSION}_linux_amd64.zip
|
||||
sudo mv vault /usr/local/bin/
|
||||
rm vault_${VAULT_VERSION}_linux_amd64.zip
|
||||
fi
|
||||
|
||||
export PATH=$PATH:/usr/local/bin
|
||||
|
||||
set -e
|
||||
|
||||
echo "===== 开始部署Vault集群 ====="
|
||||
|
||||
# 目录定义
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
ROOT_DIR="$(dirname "$SCRIPT_DIR")"
|
||||
ANSIBLE_DIR="$ROOT_DIR/playbooks"
|
||||
JOBS_DIR="$ROOT_DIR/components/vault/jobs"
|
||||
|
||||
# 颜色定义
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
RED='\033[0;31m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# 函数定义
|
||||
log_info() {
|
||||
echo -e "${GREEN}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_warn() {
|
||||
echo -e "${YELLOW}[WARN]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
# 检查命令是否存在
|
||||
check_command() {
|
||||
if ! command -v $1 &> /dev/null; then
|
||||
log_error "$1 命令未找到,请先安装"
|
||||
exit 1
|
||||
fi
|
||||
}
|
||||
|
||||
# 检查必要的命令
|
||||
check_command ansible-playbook
|
||||
check_command nomad
|
||||
check_command vault
|
||||
|
||||
# 步骤1: 使用Ansible安装Vault
|
||||
log_info "步骤1: 使用Ansible安装Vault..."
|
||||
ansible-playbook -i "$ANSIBLE_DIR/inventories/production/vault.ini" "$ANSIBLE_DIR/playbooks/install/install_vault.yml"
|
||||
|
||||
# 步骤2: 部署Vault Nomad作业
|
||||
log_info "步骤2: 部署Vault Nomad作业..."
|
||||
nomad job run "$JOBS_DIR/vault-cluster-exec.nomad"
|
||||
|
||||
# 等待Nomad作业部署完成
|
||||
log_info "等待Nomad作业部署完成..."
|
||||
sleep 10
|
||||
|
||||
# 检查Nomad作业状态
|
||||
nomad_status=$(nomad job status vault-cluster-exec | grep Status | head -1 | awk '{print $2}')
|
||||
if [ "$nomad_status" != "running" ]; then
|
||||
log_warn "Vault Nomad作业状态不是'running',当前状态: $nomad_status"
|
||||
log_info "请检查Nomad作业状态: nomad job status vault-cluster-exec"
|
||||
fi
|
||||
|
||||
# 步骤3: 检查Vault状态并初始化(如果需要)
|
||||
log_info "步骤3: 检查Vault状态..."
|
||||
export VAULT_ADDR='http://127.0.0.1:8200'
|
||||
|
||||
# 等待Vault启动
|
||||
log_info "等待Vault启动..."
|
||||
for i in {1..30}; do
|
||||
if curl -s "$VAULT_ADDR/v1/sys/health" > /dev/null; then
|
||||
break
|
||||
fi
|
||||
echo -n "."
|
||||
sleep 2
|
||||
done
|
||||
echo ""
|
||||
|
||||
# 检查Vault是否已初始化
|
||||
init_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"initialized":[^,}]*' | cut -d ':' -f2)
|
||||
if [ "$init_status" = "false" ]; then
|
||||
log_info "Vault未初始化,正在初始化..."
|
||||
|
||||
# 初始化Vault并保存密钥
|
||||
mkdir -p "$ROOT_DIR/security/secrets/vault"
|
||||
vault operator init -key-shares=5 -key-threshold=3 -format=json > "$ROOT_DIR/security/secrets/vault/init_keys.json"
|
||||
|
||||
if [ $? -eq 0 ]; then
|
||||
log_info "Vault初始化成功,解封密钥和根令牌已保存到 $ROOT_DIR/security/secrets/vault/init_keys.json"
|
||||
log_warn "请确保安全保存这些密钥!"
|
||||
|
||||
# 提取解封密钥
|
||||
unseal_key1=$(cat "$ROOT_DIR/security/secrets/vault/init_keys.json" | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -1)
|
||||
unseal_key2=$(cat "$ROOT_DIR/security/secrets/vault/init_keys.json" | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -2 | tail -1)
|
||||
unseal_key3=$(cat "$ROOT_DIR/security/secrets/vault/init_keys.json" | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -3 | tail -1)
|
||||
|
||||
# 解封Vault
|
||||
log_info "正在解封Vault..."
|
||||
vault operator unseal "$unseal_key1"
|
||||
vault operator unseal "$unseal_key2"
|
||||
vault operator unseal "$unseal_key3"
|
||||
|
||||
log_info "Vault已成功解封"
|
||||
else
|
||||
log_error "Vault初始化失败"
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
log_info "Vault已初始化"
|
||||
|
||||
# 检查Vault是否已解封
|
||||
sealed_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"sealed":[^,}]*' | cut -d ':' -f2)
|
||||
if [ "$sealed_status" = "true" ]; then
|
||||
log_warn "Vault已初始化但仍处于密封状态,请手动解封"
|
||||
log_info "使用以下命令解封Vault:"
|
||||
log_info "export VAULT_ADDR='http://127.0.0.1:8200'"
|
||||
log_info "vault operator unseal <解封密钥1>"
|
||||
log_info "vault operator unseal <解封密钥2>"
|
||||
log_info "vault operator unseal <解封密钥3>"
|
||||
else
|
||||
log_info "Vault已初始化且已解封,可以正常使用"
|
||||
fi
|
||||
fi
|
||||
|
||||
# 显示Vault状态
|
||||
log_info "Vault状态:"
|
||||
vault status
|
||||
|
||||
log_info "===== Vault集群部署完成 ====="
|
||||
log_info "请在其他节点上运行解封操作,确保集群完全可用"
|
||||
|
|
@ -1,50 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Vault开发环境使用示例
|
||||
|
||||
echo "===== Vault开发环境使用示例 ====="
|
||||
|
||||
# 设置环境变量
|
||||
source /root/mgmt/security/secrets/vault/dev/vault_env.sh
|
||||
|
||||
echo "1. 检查Vault状态"
|
||||
vault status
|
||||
|
||||
echo ""
|
||||
echo "2. 写入示例密钥值"
|
||||
vault kv put secret/myapp/config username="devuser" password="devpassword" database="devdb"
|
||||
|
||||
echo ""
|
||||
echo "3. 读取示例密钥值"
|
||||
vault kv get secret/myapp/config
|
||||
|
||||
echo ""
|
||||
echo "4. 列出密钥路径"
|
||||
vault kv list secret/myapp/
|
||||
|
||||
echo ""
|
||||
echo "5. 创建示例策略"
|
||||
cat > /tmp/dev-policy.hcl << EOF
|
||||
# 开发环境示例策略
|
||||
path "secret/*" {
|
||||
capabilities = ["create", "read", "update", "delete", "list"]
|
||||
}
|
||||
|
||||
path "sys/mounts" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
EOF
|
||||
|
||||
vault policy write dev-policy /tmp/dev-policy.hcl
|
||||
|
||||
echo ""
|
||||
echo "6. 创建有限权限令牌"
|
||||
vault token create -policy=dev-policy
|
||||
|
||||
echo ""
|
||||
echo "7. 启用并配置其他密钥引擎示例"
|
||||
echo "启用数据库密钥引擎:"
|
||||
echo "vault secrets enable database"
|
||||
|
||||
echo ""
|
||||
echo "===== Vault开发环境示例完成 ====="
|
||||
echo "注意:这些命令仅用于开发测试,请勿在生产环境中使用相同配置"
|
||||
|
|
@ -1,56 +0,0 @@
|
|||
#!/bin/bash
|
||||
# Vault开发环境快速开始指南
|
||||
|
||||
echo "===== Vault开发环境快速开始 ====="
|
||||
|
||||
# 1. 设置环境变量
|
||||
echo "1. 设置环境变量"
|
||||
source /root/mgmt/security/secrets/vault/dev/vault_env.sh
|
||||
echo "VAULT_ADDR: $VAULT_ADDR"
|
||||
echo "VAULT_TOKEN: $VAULT_TOKEN"
|
||||
|
||||
# 2. 检查Vault状态
|
||||
echo ""
|
||||
echo "2. 检查Vault状态"
|
||||
vault status
|
||||
|
||||
# 3. 存储密钥值
|
||||
echo ""
|
||||
echo "3. 存储密钥值"
|
||||
vault kv put secret/example/api_key value="my_secret_api_key_12345"
|
||||
|
||||
# 4. 读取密钥值
|
||||
echo ""
|
||||
echo "4. 读取密钥值"
|
||||
vault kv get secret/example/api_key
|
||||
|
||||
# 5. 列出密钥路径
|
||||
echo ""
|
||||
echo "5. 列出密钥路径"
|
||||
vault kv list secret/example/
|
||||
|
||||
# 6. 创建策略示例
|
||||
echo ""
|
||||
echo "6. 创建示例策略"
|
||||
cat > /tmp/example-policy.hcl << EOF
|
||||
# 示例策略 - 允许读取secret/example路径下的密钥
|
||||
path "secret/example/*" {
|
||||
capabilities = ["read", "list"]
|
||||
}
|
||||
|
||||
# 允许列出密钥引擎
|
||||
path "sys/mounts" {
|
||||
capabilities = ["read"]
|
||||
}
|
||||
EOF
|
||||
|
||||
vault policy write example-policy /tmp/example-policy.hcl
|
||||
|
||||
# 7. 创建有限权限令牌
|
||||
echo ""
|
||||
echo "7. 创建有限权限令牌"
|
||||
vault token create -policy=example-policy
|
||||
|
||||
echo ""
|
||||
echo "===== Vault开发环境快速开始完成 ====="
|
||||
echo "您现在可以开始在开发环境中使用Vault了!"
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Consul 集群同步诊断脚本
|
||||
|
||||
echo "=== Consul 集群同步诊断 ==="
|
||||
echo "时间: $(date)"
|
||||
echo ""
|
||||
|
||||
CONSUL_NODES=(
|
||||
"master.tailnet-68f9.ts.net:8500"
|
||||
"warden.tailnet-68f9.ts.net:8500"
|
||||
"ash3c.tailnet-68f9.ts.net:8500"
|
||||
)
|
||||
|
||||
echo "1. 检查集群状态"
|
||||
echo "=================="
|
||||
for node in "${CONSUL_NODES[@]}"; do
|
||||
echo "节点: $node"
|
||||
echo " Leader: $(curl -s http://$node/v1/status/leader 2>/dev/null || echo 'ERROR')"
|
||||
echo " Peers: $(curl -s http://$node/v1/status/peers 2>/dev/null | jq length 2>/dev/null || echo 'ERROR')"
|
||||
echo ""
|
||||
done
|
||||
|
||||
echo "2. 检查服务注册"
|
||||
echo "================"
|
||||
for node in "${CONSUL_NODES[@]}"; do
|
||||
echo "节点: $node"
|
||||
echo " Catalog 服务:"
|
||||
curl -s http://$node/v1/catalog/services 2>/dev/null | jq -r 'keys[]' 2>/dev/null | grep -E "(consul-lb|traefik)" | sed 's/^/ /' || echo " ERROR 或无服务"
|
||||
|
||||
echo " Agent 服务:"
|
||||
curl -s http://$node/v1/agent/services 2>/dev/null | jq -r 'keys[]' 2>/dev/null | grep -E "traefik" | sed 's/^/ /' || echo " 无本地服务"
|
||||
echo ""
|
||||
done
|
||||
|
||||
echo "3. 检查健康状态"
|
||||
echo "================"
|
||||
for node in "${CONSUL_NODES[@]}"; do
|
||||
echo "节点: $node"
|
||||
checks=$(curl -s http://$node/v1/agent/checks 2>/dev/null)
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "$checks" | jq -r 'to_entries[] | select(.key | contains("traefik")) | " \(.key): \(.value.Status)"' 2>/dev/null || echo " 无 Traefik 健康检查"
|
||||
else
|
||||
echo " ERROR: 无法连接"
|
||||
fi
|
||||
echo ""
|
||||
done
|
||||
|
||||
echo "4. 网络连通性测试"
|
||||
echo "=================="
|
||||
echo "测试从当前节点到 Traefik 的连接:"
|
||||
curl -s -w " HTTP %{http_code} - 响应时间: %{time_total}s\n" -o /dev/null http://100.97.62.111:80/ || echo " ERROR: 无法连接到 Traefik"
|
||||
curl -s -w " HTTP %{http_code} - 响应时间: %{time_total}s\n" -o /dev/null http://100.97.62.111:8080/api/overview || echo " ERROR: 无法连接到 Traefik Dashboard"
|
||||
|
||||
echo ""
|
||||
echo "5. 建议操作"
|
||||
echo "==========="
|
||||
echo "如果发现问题:"
|
||||
echo " 1. 重新注册服务: ./scripts/register-traefik-to-all-consul.sh"
|
||||
echo " 2. 检查 Consul 日志: nomad alloc logs \$(nomad job allocs consul-cluster-nomad | grep warden | awk '{print \$1}') consul"
|
||||
echo " 3. 重启有问题的 Consul 节点"
|
||||
echo " 4. 检查网络连通性和防火墙设置"
|
||||
|
|
@ -1,87 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# 链接所有MCP配置文件的脚本
|
||||
# 该脚本将所有IDE和AI助手的MCP配置链接到NFS共享的配置文件
|
||||
|
||||
NFS_CONFIG="/mnt/fnsync/mcp/mcp_shared_config.json"
|
||||
|
||||
echo "链接所有MCP配置文件到NFS共享配置..."
|
||||
|
||||
# 检查NFS配置文件是否存在
|
||||
if [ ! -f "$NFS_CONFIG" ]; then
|
||||
echo "错误: NFS配置文件不存在: $NFS_CONFIG"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "✓ 使用NFS共享配置作为基准: $NFS_CONFIG"
|
||||
|
||||
# 定义所有可能的MCP配置位置
|
||||
CONFIGS=(
|
||||
# Kilo Code IDE (全局配置,移除了项目级别配置以避免冲突)
|
||||
"../.trae-server/data/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json"
|
||||
|
||||
# Tencent CodeBuddy
|
||||
"$HOME/.codebuddy-server/data/User/globalStorage/tencent.planning-genie/settings/codebuddy_mcp_settings.json"
|
||||
"$HOME/.codebuddy/data/User/globalStorage/tencent.planning-genie/settings/codebuddy_mcp_settings.json"
|
||||
# 新增的CodeBuddy-CN
|
||||
"$HOME/.codebuddy-server-cn/data/User/globalStorage/tencent.planning-genie/settings/codebuddy_mcp_settings.json"
|
||||
|
||||
# Claude相关
|
||||
"$HOME/.claude.json"
|
||||
"$HOME/.claude.json.backup"
|
||||
"$HOME/.config/claude/settings/mcp_settings.json"
|
||||
|
||||
# Cursor
|
||||
"$HOME/.cursor-server/data/User/globalStorage/xxx.cursor/settings/mcp_settings.json"
|
||||
|
||||
# Qoder
|
||||
"$HOME/.qoder-server/data/User/globalStorage/xxx.qoder/settings/mcp_settings.json"
|
||||
|
||||
# Cline
|
||||
"$HOME/.codebuddy-server/data/User/globalStorage/rooveterinaryinc.roo-cline/settings/mcp_settings.json"
|
||||
"$HOME/Cline/settings/mcp_settings.json"
|
||||
|
||||
# Kiro
|
||||
"$HOME/.kiro-server/data/User/globalStorage/xxx.kiro/settings/mcp_settings.json"
|
||||
|
||||
# Qwen
|
||||
"$HOME/.qwen/settings/mcp_settings.json"
|
||||
|
||||
# VSCodium
|
||||
"$HOME/.vscodium-server/data/User/globalStorage/xxx.vscodium/settings/mcp_settings.json"
|
||||
|
||||
# Other potential locations
|
||||
".kilocode/mcp.json"
|
||||
"$HOME/.config/Qoder/SharedClientCache/mcp.json"
|
||||
"$HOME/.trae-server/data/Machine/mcp.json"
|
||||
"$HOME/.trae-cn-server/data/Machine/mcp.json"
|
||||
"$HOME/.codegeex/agent/configs/user_mcp_config.json"
|
||||
"$HOME/.codegeex/agent/configs/mcp_config.json"
|
||||
)
|
||||
|
||||
# 链接到每个配置位置
|
||||
for config_path in "${CONFIGS[@]}"; do
|
||||
if [ -n "$config_path" ]; then
|
||||
config_dir=$(dirname "$config_path")
|
||||
if [ -d "$config_dir" ]; then
|
||||
# 如果目标文件已存在,先备份
|
||||
if [ -f "$config_path" ]; then
|
||||
mv "$config_path" "${config_path}.backup"
|
||||
echo "✓ 原配置文件已备份: ${config_path}.backup"
|
||||
fi
|
||||
|
||||
# 创建符号链接
|
||||
ln -s "$NFS_CONFIG" "$config_path" 2>/dev/null
|
||||
if [ $? -eq 0 ]; then
|
||||
echo "✓ 已创建链接到: $config_path"
|
||||
else
|
||||
echo "✗ 创建链接失败: $config_path"
|
||||
fi
|
||||
else
|
||||
echo "✗ 目录不存在: $config_dir"
|
||||
fi
|
||||
fi
|
||||
done
|
||||
|
||||
echo "所有MCP配置链接完成!"
|
||||
echo "所有IDE和AI助手现在都使用NFS共享的MCP配置文件: $NFS_CONFIG"
|
||||
|
|
@ -1,380 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Qdrant MCP 服务器
|
||||
此脚本实现了一个 MCP 服务器,与 Qdrant 向量数据库集成
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from typing import Any, Dict, List, Optional
|
||||
import logging
|
||||
|
||||
from qdrant_client import QdrantClient
|
||||
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter
|
||||
|
||||
# 设置日志
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class QdrantMCPServer:
|
||||
def __init__(self):
|
||||
# 从环境变量获取配置
|
||||
self.qdrant_url = os.getenv("QDRANT_URL", "http://localhost:6333")
|
||||
self.qdrant_api_key = os.getenv("QDRANT_API_KEY", "")
|
||||
self.collection_name = os.getenv("COLLECTION_NAME", "mcp")
|
||||
self.embedding_model = os.getenv("EMBEDDING_MODEL", "bge-m3")
|
||||
|
||||
# 初始化 Qdrant 客户端
|
||||
self.client = QdrantClient(
|
||||
url=self.qdrant_url,
|
||||
api_key=self.qdrant_api_key if self.qdrant_api_key else None
|
||||
)
|
||||
|
||||
# 确保集合存在
|
||||
self._ensure_collection_exists()
|
||||
|
||||
logger.info(f"Qdrant MCP 服务器已初始化")
|
||||
logger.info(f"Qdrant URL: {self.qdrant_url}")
|
||||
logger.info(f"集合名称: {self.collection_name}")
|
||||
logger.info(f"嵌入模型: {self.embedding_model}")
|
||||
|
||||
def _ensure_collection_exists(self):
|
||||
"""确保集合存在,如果不存在则创建"""
|
||||
try:
|
||||
collections = self.client.get_collections().collections
|
||||
collection_names = [collection.name for collection in collections]
|
||||
|
||||
if self.collection_name not in collection_names:
|
||||
# 创建新集合
|
||||
self.client.create_collection(
|
||||
collection_name=self.collection_name,
|
||||
vectors_config=VectorParams(size=1024, distance=Distance.COSINE)
|
||||
)
|
||||
logger.info(f"已创建新集合: {self.collection_name}")
|
||||
else:
|
||||
logger.info(f"集合已存在: {self.collection_name}")
|
||||
except Exception as e:
|
||||
logger.error(f"确保集合存在时出错: {e}")
|
||||
raise
|
||||
|
||||
async def handle_request(self, request: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""处理 MCP 请求"""
|
||||
method = request.get("method")
|
||||
params = request.get("params", {})
|
||||
request_id = request.get("id")
|
||||
|
||||
logger.info(f"收到请求: {method}")
|
||||
|
||||
try:
|
||||
if method == "initialize":
|
||||
result = await self.initialize(params)
|
||||
elif method == "tools/list":
|
||||
result = await self.list_tools(params)
|
||||
elif method == "tools/call":
|
||||
result = await self.call_tool(params)
|
||||
elif method == "resources/list":
|
||||
result = await self.list_resources(params)
|
||||
elif method == "resources/read":
|
||||
result = await self.read_resource(params)
|
||||
else:
|
||||
result = {
|
||||
"error": {
|
||||
"code": -32601,
|
||||
"message": f"未知方法: {method}"
|
||||
}
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"处理请求时出错: {e}")
|
||||
result = {
|
||||
"error": {
|
||||
"code": -32603,
|
||||
"message": f"内部错误: {str(e)}"
|
||||
}
|
||||
}
|
||||
|
||||
response = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": request_id,
|
||||
**result
|
||||
}
|
||||
|
||||
return response
|
||||
|
||||
async def initialize(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""初始化 MCP 服务器"""
|
||||
logger.info("初始化 Qdrant MCP 服务器")
|
||||
|
||||
return {
|
||||
"result": {
|
||||
"protocolVersion": "2024-11-05",
|
||||
"capabilities": {
|
||||
"tools": {
|
||||
"listChanged": False
|
||||
},
|
||||
"resources": {
|
||||
"subscribe": False,
|
||||
"listChanged": False
|
||||
}
|
||||
},
|
||||
"serverInfo": {
|
||||
"name": "qdrant-mcp-server",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
async def list_tools(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""列出可用工具"""
|
||||
return {
|
||||
"result": {
|
||||
"tools": [
|
||||
{
|
||||
"name": "qdrant_search",
|
||||
"description": "在 Qdrant 中搜索相似向量",
|
||||
"inputSchema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "搜索查询文本"
|
||||
},
|
||||
"limit": {
|
||||
"type": "integer",
|
||||
"default": 5,
|
||||
"description": "返回结果数量限制"
|
||||
}
|
||||
},
|
||||
"required": ["query"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "qdrant_add",
|
||||
"description": "向 Qdrant 添加向量",
|
||||
"inputSchema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"text": {
|
||||
"type": "string",
|
||||
"description": "要添加的文本内容"
|
||||
},
|
||||
"metadata": {
|
||||
"type": "object",
|
||||
"description": "与文本关联的元数据"
|
||||
}
|
||||
},
|
||||
"required": ["text"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "qdrant_delete",
|
||||
"description": "从 Qdrant 删除向量",
|
||||
"inputSchema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"id": {
|
||||
"type": "string",
|
||||
"description": "要删除的向量ID"
|
||||
}
|
||||
},
|
||||
"required": ["id"]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
async def call_tool(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""调用工具"""
|
||||
name = params.get("name")
|
||||
arguments = params.get("arguments", {})
|
||||
|
||||
if name == "qdrant_search":
|
||||
return await self._search_vectors(arguments)
|
||||
elif name == "qdrant_add":
|
||||
return await self._add_vector(arguments)
|
||||
elif name == "qdrant_delete":
|
||||
return await self._delete_vector(arguments)
|
||||
else:
|
||||
return {
|
||||
"error": {
|
||||
"code": -32601,
|
||||
"message": f"未知工具: {name}"
|
||||
}
|
||||
}
|
||||
|
||||
async def _search_vectors(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""搜索相似向量"""
|
||||
query = params.get("query", "")
|
||||
limit = params.get("limit", 5)
|
||||
|
||||
# 这里应该使用嵌入模型将查询转换为向量
|
||||
# 由于我们没有实际的嵌入模型,这里使用一个简单的模拟
|
||||
query_vector = [0.1] * 1024 # 模拟向量
|
||||
|
||||
try:
|
||||
search_result = self.client.search(
|
||||
collection_name=self.collection_name,
|
||||
query_vector=query_vector,
|
||||
limit=limit
|
||||
)
|
||||
|
||||
results = []
|
||||
for hit in search_result:
|
||||
results.append({
|
||||
"id": hit.id,
|
||||
"score": hit.score,
|
||||
"payload": hit.payload
|
||||
})
|
||||
|
||||
return {
|
||||
"result": {
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": f"搜索结果: {json.dumps(results, ensure_ascii=False)}"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"搜索向量时出错: {e}")
|
||||
return {
|
||||
"error": {
|
||||
"code": -32603,
|
||||
"message": f"搜索向量时出错: {str(e)}"
|
||||
}
|
||||
}
|
||||
|
||||
async def _add_vector(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""添加向量"""
|
||||
text = params.get("text", "")
|
||||
metadata = params.get("metadata", {})
|
||||
|
||||
# 生成一个简单的ID
|
||||
import hashlib
|
||||
vector_id = hashlib.md5(text.encode()).hexdigest()
|
||||
|
||||
# 这里应该使用嵌入模型将文本转换为向量
|
||||
# 由于我们没有实际的嵌入模型,这里使用一个简单的模拟
|
||||
vector = [0.1] * 1024 # 模拟向量
|
||||
|
||||
try:
|
||||
self.client.upsert(
|
||||
collection_name=self.collection_name,
|
||||
points=[
|
||||
PointStruct(
|
||||
id=vector_id,
|
||||
vector=vector,
|
||||
payload={
|
||||
"text": text,
|
||||
**metadata
|
||||
}
|
||||
)
|
||||
]
|
||||
)
|
||||
|
||||
return {
|
||||
"result": {
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": f"已添加向量,ID: {vector_id}"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"添加向量时出错: {e}")
|
||||
return {
|
||||
"error": {
|
||||
"code": -32603,
|
||||
"message": f"添加向量时出错: {str(e)}"
|
||||
}
|
||||
}
|
||||
|
||||
async def _delete_vector(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""删除向量"""
|
||||
vector_id = params.get("id", "")
|
||||
|
||||
try:
|
||||
self.client.delete(
|
||||
collection_name=self.collection_name,
|
||||
points_selector=[vector_id]
|
||||
)
|
||||
|
||||
return {
|
||||
"result": {
|
||||
"content": [
|
||||
{
|
||||
"type": "text",
|
||||
"text": f"已删除向量,ID: {vector_id}"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
except Exception as e:
|
||||
logger.error(f"删除向量时出错: {e}")
|
||||
return {
|
||||
"error": {
|
||||
"code": -32603,
|
||||
"message": f"删除向量时出错: {str(e)}"
|
||||
}
|
||||
}
|
||||
|
||||
async def list_resources(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""列出资源"""
|
||||
return {
|
||||
"result": {
|
||||
"resources": []
|
||||
}
|
||||
}
|
||||
|
||||
async def read_resource(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""读取资源"""
|
||||
return {
|
||||
"error": {
|
||||
"code": -32601,
|
||||
"message": "不支持读取资源"
|
||||
}
|
||||
}
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
server = QdrantMCPServer()
|
||||
|
||||
# 从标准输入读取请求
|
||||
for line in sys.stdin:
|
||||
try:
|
||||
request = json.loads(line)
|
||||
response = await server.handle_request(request)
|
||||
print(json.dumps(response, ensure_ascii=False))
|
||||
sys.stdout.flush()
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"解析 JSON 时出错: {e}")
|
||||
error_response = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": None,
|
||||
"error": {
|
||||
"code": -32700,
|
||||
"message": f"解析 JSON 时出错: {str(e)}"
|
||||
}
|
||||
}
|
||||
print(json.dumps(error_response, ensure_ascii=False))
|
||||
sys.stdout.flush()
|
||||
except Exception as e:
|
||||
logger.error(f"处理请求时出错: {e}")
|
||||
error_response = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": None,
|
||||
"error": {
|
||||
"code": -32603,
|
||||
"message": f"内部错误: {str(e)}"
|
||||
}
|
||||
}
|
||||
print(json.dumps(error_response, ensure_ascii=False))
|
||||
sys.stdout.flush()
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
|
|
@ -1,117 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Qdrant 与 Ollama 嵌入模型集成示例
|
||||
此脚本演示如何使用 Ollama 作为嵌入模型提供者与 Qdrant 向量数据库集成
|
||||
"""
|
||||
|
||||
from langchain_ollama import OllamaEmbeddings
|
||||
from qdrant_client import QdrantClient
|
||||
from qdrant_client.models import Distance, VectorParams, PointStruct
|
||||
import os
|
||||
|
||||
def main():
|
||||
# 1. 初始化 Ollama 嵌入模型
|
||||
# 使用 nomic-embed-text 模型,这是 Ollama 推荐的嵌入模型
|
||||
print("初始化 Ollama 嵌入模型...")
|
||||
embeddings = OllamaEmbeddings(
|
||||
model="nomic-embed-text",
|
||||
base_url="http://localhost:11434" # Ollama 默认地址
|
||||
)
|
||||
|
||||
# 2. 初始化 Qdrant 客户端
|
||||
print("连接到 Qdrant 数据库...")
|
||||
client = QdrantClient(
|
||||
url="http://localhost:6333", # Qdrant 默认地址
|
||||
api_key="313131" # 从之前查看的配置中获取的 API 密钥
|
||||
)
|
||||
|
||||
# 3. 创建集合(如果不存在)
|
||||
collection_name = "ollama_integration_test"
|
||||
print(f"创建或检查集合: {collection_name}")
|
||||
|
||||
# 首先检查集合是否已存在
|
||||
collections = client.get_collections().collections
|
||||
collection_exists = any(collection.name == collection_name for collection in collections)
|
||||
|
||||
if not collection_exists:
|
||||
# 创建新集合
|
||||
# 首先获取嵌入模型的维度
|
||||
sample_embedding = embeddings.embed_query("sample text")
|
||||
vector_size = len(sample_embedding)
|
||||
|
||||
client.create_collection(
|
||||
collection_name=collection_name,
|
||||
vectors_config=VectorParams(
|
||||
size=vector_size,
|
||||
distance=Distance.COSINE
|
||||
)
|
||||
)
|
||||
print(f"已创建新集合,向量维度: {vector_size}")
|
||||
else:
|
||||
print("集合已存在")
|
||||
|
||||
# 4. 准备示例数据
|
||||
documents = [
|
||||
"Qdrant 是一个高性能的向量搜索引擎",
|
||||
"Ollama 是一个本地运行大语言模型的工具",
|
||||
"向量数据库用于存储和检索高维向量",
|
||||
"嵌入模型将文本转换为数值向量表示"
|
||||
]
|
||||
|
||||
metadata = [
|
||||
{"source": "qdrant_docs", "category": "database"},
|
||||
{"source": "ollama_docs", "category": "llm"},
|
||||
{"source": "vector_db_docs", "category": "database"},
|
||||
{"source": "embedding_docs", "category": "ml"}
|
||||
]
|
||||
|
||||
# 5. 使用 Ollama 生成嵌入并存储到 Qdrant
|
||||
print("生成嵌入并存储到 Qdrant...")
|
||||
points = []
|
||||
|
||||
for idx, (doc, meta) in enumerate(zip(documents, metadata)):
|
||||
# 使用 Ollama 生成嵌入
|
||||
embedding = embeddings.embed_query(doc)
|
||||
|
||||
# 创建 Qdrant 点
|
||||
point = PointStruct(
|
||||
id=idx,
|
||||
vector=embedding,
|
||||
payload={
|
||||
"text": doc,
|
||||
"metadata": meta
|
||||
}
|
||||
)
|
||||
points.append(point)
|
||||
|
||||
# 上传点到 Qdrant
|
||||
client.upsert(
|
||||
collection_name=collection_name,
|
||||
points=points
|
||||
)
|
||||
print(f"已上传 {len(points)} 个文档到 Qdrant")
|
||||
|
||||
# 6. 执行相似性搜索
|
||||
query = "什么是向量数据库?"
|
||||
print(f"\n执行搜索查询: '{query}'")
|
||||
|
||||
# 使用 Ollama 生成查询嵌入
|
||||
query_embedding = embeddings.embed_query(query)
|
||||
|
||||
# 在 Qdrant 中搜索
|
||||
search_result = client.search(
|
||||
collection_name=collection_name,
|
||||
query_vector=query_embedding,
|
||||
limit=2
|
||||
)
|
||||
|
||||
# 7. 显示搜索结果
|
||||
print("\n搜索结果:")
|
||||
for i, hit in enumerate(search_result, 1):
|
||||
print(f"{i}. {hit.payload['text']} (得分: {hit.score:.4f})")
|
||||
print(f" 元数据: {hit.payload['metadata']}")
|
||||
|
||||
print("\n集成测试完成!")
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
|
|
@ -1,357 +0,0 @@
|
|||
#!/usr/bin/env python3
|
||||
"""
|
||||
Qdrant 与 Ollama 嵌入模型集成的 MCP 服务器
|
||||
此脚本实现了一个 MCP 服务器,使用 Ollama 作为嵌入模型提供者与 Qdrant 向量数据库集成
|
||||
"""
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
import sys
|
||||
from typing import Any, Dict, List, Optional
|
||||
import logging
|
||||
|
||||
from langchain_ollama import OllamaEmbeddings
|
||||
from qdrant_client import QdrantClient
|
||||
from qdrant_client.models import Distance, VectorParams, PointStruct, Filter
|
||||
|
||||
# 设置日志
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
class QdrantOllamaMCPServer:
|
||||
def __init__(self):
|
||||
# 在初始化之前打印环境变量
|
||||
print(f"环境变量:")
|
||||
print(f"QDRANT_URL: {os.getenv('QDRANT_URL', '未设置')}")
|
||||
print(f"QDRANT_API_KEY: {os.getenv('QDRANT_API_KEY', '未设置')}")
|
||||
print(f"OLLAMA_URL: {os.getenv('OLLAMA_URL', '未设置')}")
|
||||
print(f"OLLAMA_MODEL: {os.getenv('OLLAMA_MODEL', '未设置')}")
|
||||
print(f"COLLECTION_NAME: {os.getenv('COLLECTION_NAME', '未设置')}")
|
||||
|
||||
# 从环境变量获取配置
|
||||
self.qdrant_url = os.getenv("QDRANT_URL", "http://dev1:6333") # dev1服务器上的Qdrant地址
|
||||
self.qdrant_api_key = os.getenv("QDRANT_API_KEY", "313131")
|
||||
self.collection_name = os.getenv("COLLECTION_NAME", "ollama_mcp")
|
||||
self.ollama_model = os.getenv("OLLAMA_MODEL", "nomic-embed-text")
|
||||
self.ollama_url = os.getenv("OLLAMA_URL", "http://dev1:11434") # dev1服务器上的Ollama地址
|
||||
|
||||
# 初始化客户端
|
||||
self.embeddings = OllamaEmbeddings(
|
||||
model=self.ollama_model,
|
||||
base_url=self.ollama_url
|
||||
)
|
||||
|
||||
self.client = QdrantClient(
|
||||
url=self.qdrant_url,
|
||||
api_key=self.qdrant_api_key
|
||||
)
|
||||
|
||||
# 确保集合存在
|
||||
self._ensure_collection_exists()
|
||||
|
||||
logger.info(f"初始化完成,使用集合: {self.collection_name}")
|
||||
|
||||
def _ensure_collection_exists(self):
|
||||
"""确保集合存在,如果不存在则创建"""
|
||||
collections = self.client.get_collections().collections
|
||||
collection_exists = any(collection.name == self.collection_name for collection in collections)
|
||||
|
||||
if not collection_exists:
|
||||
# 获取嵌入模型的维度
|
||||
sample_embedding = self.embeddings.embed_query("sample text")
|
||||
vector_size = len(sample_embedding)
|
||||
|
||||
self.client.create_collection(
|
||||
collection_name=self.collection_name,
|
||||
vectors_config=VectorParams(
|
||||
size=vector_size,
|
||||
distance=Distance.COSINE
|
||||
)
|
||||
)
|
||||
logger.info(f"已创建新集合,向量维度: {vector_size}")
|
||||
else:
|
||||
logger.info("集合已存在")
|
||||
|
||||
async def handle_request(self, request: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""处理 MCP 请求"""
|
||||
method = request.get("method")
|
||||
params = request.get("params", {})
|
||||
request_id = request.get("id")
|
||||
|
||||
logger.info(f"处理请求: {method}")
|
||||
|
||||
try:
|
||||
if method == "initialize":
|
||||
result = {
|
||||
"protocolVersion": "2024-11-05",
|
||||
"capabilities": {
|
||||
"tools": {
|
||||
"listChanged": True
|
||||
},
|
||||
"resources": {
|
||||
"subscribe": True,
|
||||
"listChanged": True
|
||||
}
|
||||
},
|
||||
"serverInfo": {
|
||||
"name": "qdrant-ollama-mcp-server",
|
||||
"version": "1.0.0"
|
||||
}
|
||||
}
|
||||
elif method == "tools/list":
|
||||
result = {
|
||||
"tools": [
|
||||
{
|
||||
"name": "add_document",
|
||||
"description": "添加文档到向量数据库",
|
||||
"inputSchema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"text": {
|
||||
"type": "string",
|
||||
"description": "文档文本内容"
|
||||
},
|
||||
"metadata": {
|
||||
"type": "object",
|
||||
"description": "文档的元数据"
|
||||
}
|
||||
},
|
||||
"required": ["text"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "search_documents",
|
||||
"description": "在向量数据库中搜索相似文档",
|
||||
"inputSchema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "搜索查询文本"
|
||||
},
|
||||
"limit": {
|
||||
"type": "integer",
|
||||
"description": "返回结果数量限制",
|
||||
"default": 5
|
||||
},
|
||||
"filter": {
|
||||
"type": "object",
|
||||
"description": "搜索过滤器"
|
||||
}
|
||||
},
|
||||
"required": ["query"]
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "list_collections",
|
||||
"description": "列出所有集合",
|
||||
"inputSchema": {
|
||||
"type": "object",
|
||||
"properties": {}
|
||||
}
|
||||
},
|
||||
{
|
||||
"name": "get_collection_info",
|
||||
"description": "获取集合信息",
|
||||
"inputSchema": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
"collection_name": {
|
||||
"type": "string",
|
||||
"description": "集合名称"
|
||||
}
|
||||
},
|
||||
"required": ["collection_name"]
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
elif method == "tools/call":
|
||||
tool_name = params.get("name")
|
||||
tool_params = params.get("arguments", {})
|
||||
|
||||
if tool_name == "add_document":
|
||||
result = await self._add_document(tool_params)
|
||||
elif tool_name == "search_documents":
|
||||
result = await self._search_documents(tool_params)
|
||||
elif tool_name == "list_collections":
|
||||
result = await self._list_collections(tool_params)
|
||||
elif tool_name == "get_collection_info":
|
||||
result = await self._get_collection_info(tool_params)
|
||||
else:
|
||||
raise ValueError(f"未知工具: {tool_name}")
|
||||
else:
|
||||
raise ValueError(f"未知方法: {method}")
|
||||
|
||||
response = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": request_id,
|
||||
"result": result
|
||||
}
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"处理请求时出错: {e}")
|
||||
response = {
|
||||
"jsonrpc": "2.0",
|
||||
"id": request_id,
|
||||
"error": {
|
||||
"code": -1,
|
||||
"message": str(e)
|
||||
}
|
||||
}
|
||||
|
||||
return response
|
||||
|
||||
async def _add_document(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""添加文档到向量数据库"""
|
||||
text = params.get("text")
|
||||
metadata = params.get("metadata", {})
|
||||
|
||||
if not text:
|
||||
raise ValueError("文档文本不能为空")
|
||||
|
||||
# 生成嵌入
|
||||
embedding = self.embeddings.embed_query(text)
|
||||
|
||||
# 创建点
|
||||
point = PointStruct(
|
||||
id=hash(text) % (2 ** 31), # 使用文本哈希作为ID
|
||||
vector=embedding,
|
||||
payload={
|
||||
"text": text,
|
||||
"metadata": metadata
|
||||
}
|
||||
)
|
||||
|
||||
# 上传到 Qdrant
|
||||
self.client.upsert(
|
||||
collection_name=self.collection_name,
|
||||
points=[point]
|
||||
)
|
||||
|
||||
return {"success": True, "message": "文档已添加"}
|
||||
|
||||
async def _search_documents(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""在向量数据库中搜索相似文档"""
|
||||
query = params.get("query")
|
||||
limit = params.get("limit", 5)
|
||||
filter_dict = params.get("filter")
|
||||
|
||||
if not query:
|
||||
raise ValueError("搜索查询不能为空")
|
||||
|
||||
# 生成查询嵌入
|
||||
query_embedding = self.embeddings.embed_query(query)
|
||||
|
||||
# 构建过滤器
|
||||
search_filter = None
|
||||
if filter_dict:
|
||||
search_filter = Filter(**filter_dict)
|
||||
|
||||
# 执行搜索
|
||||
search_result = self.client.search(
|
||||
collection_name=self.collection_name,
|
||||
query_vector=query_embedding,
|
||||
limit=limit,
|
||||
query_filter=search_filter
|
||||
)
|
||||
|
||||
# 格式化结果
|
||||
results = []
|
||||
for hit in search_result:
|
||||
results.append({
|
||||
"text": hit.payload.get("text", ""),
|
||||
"metadata": hit.payload.get("metadata", {}),
|
||||
"score": hit.score
|
||||
})
|
||||
|
||||
return {"results": results}
|
||||
|
||||
async def _list_collections(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""列出所有集合"""
|
||||
collections = self.client.get_collections().collections
|
||||
return {
|
||||
"collections": [
|
||||
{"name": collection.name} for collection in collections
|
||||
]
|
||||
}
|
||||
|
||||
async def _get_collection_info(self, params: Dict[str, Any]) -> Dict[str, Any]:
|
||||
"""获取集合信息"""
|
||||
collection_name = params.get("collection_name")
|
||||
|
||||
if not collection_name:
|
||||
raise ValueError("集合名称不能为空")
|
||||
|
||||
try:
|
||||
collection_info = self.client.get_collection(collection_name)
|
||||
return {
|
||||
"name": collection_name,
|
||||
"vectors_count": collection_info.points_count,
|
||||
"vectors_config": collection_info.config.params.vectors.dict()
|
||||
}
|
||||
except Exception as e:
|
||||
raise ValueError(f"获取集合信息失败: {str(e)}")
|
||||
|
||||
async def run(self):
|
||||
"""运行 MCP 服务器"""
|
||||
logger.info("启动 Qdrant-Ollama MCP 服务器")
|
||||
logger.info(f"Qdrant URL: {self.qdrant_url}")
|
||||
logger.info(f"Ollama URL: {self.ollama_url}")
|
||||
logger.info(f"Collection: {self.collection_name}")
|
||||
|
||||
# 从标准输入读取请求
|
||||
while True:
|
||||
try:
|
||||
line = await asyncio.get_event_loop().run_in_executor(
|
||||
None, sys.stdin.readline
|
||||
)
|
||||
if not line:
|
||||
break
|
||||
|
||||
logger.info(f"收到请求: {line.strip()}")
|
||||
|
||||
# 解析 JSON 请求
|
||||
request = json.loads(line.strip())
|
||||
|
||||
# 处理请求
|
||||
response = await self.handle_request(request)
|
||||
|
||||
# 发送响应
|
||||
response_json = json.dumps(response)
|
||||
print(response_json, flush=True)
|
||||
logger.info(f"发送响应: {response_json}")
|
||||
|
||||
except json.JSONDecodeError as e:
|
||||
logger.error(f"JSON 解析错误: {e}")
|
||||
except Exception as e:
|
||||
logger.error(f"处理请求时出错: {e}")
|
||||
except KeyboardInterrupt:
|
||||
logger.info("服务器被中断")
|
||||
break
|
||||
|
||||
async def main():
|
||||
"""主函数"""
|
||||
# 设置日志级别
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
|
||||
)
|
||||
|
||||
# 打印环境变量
|
||||
print(f"环境变量:")
|
||||
print(f"QDRANT_URL: {os.getenv('QDRANT_URL', '未设置')}")
|
||||
print(f"QDRANT_API_KEY: {os.getenv('QDRANT_API_KEY', '未设置')}")
|
||||
print(f"OLLAMA_URL: {os.getenv('OLLAMA_URL', '未设置')}")
|
||||
print(f"OLLAMA_MODEL: {os.getenv('OLLAMA_MODEL', '未设置')}")
|
||||
print(f"COLLECTION_NAME: {os.getenv('COLLECTION_NAME', '未设置')}")
|
||||
|
||||
# 创建服务器实例
|
||||
server = QdrantOllamaMCPServer()
|
||||
|
||||
# 运行服务器
|
||||
await server.run()
|
||||
|
||||
if __name__ == "__main__":
|
||||
asyncio.run(main())
|
||||
|
|
@ -1,10 +0,0 @@
|
|||
#!/bin/bash
|
||||
# 设置环境变量
|
||||
export QDRANT_URL=http://dev1:6333
|
||||
export QDRANT_API_KEY=313131
|
||||
export OLLAMA_URL=http://dev1:11434
|
||||
export OLLAMA_MODEL=nomic-embed-text
|
||||
export COLLECTION_NAME=ollama_mcp
|
||||
|
||||
# 启动MCP服务器
|
||||
python /home/ben/qdrant/qdrant_ollama_mcp_server.py
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
#!/bin/bash
|
||||
|
||||
# 向所有三个 Consul 节点注册 Traefik 服务
|
||||
# 解决 Consul leader 轮换问题
|
||||
|
||||
CONSUL_NODES=(
|
||||
"master.tailnet-68f9.ts.net:8500"
|
||||
"warden.tailnet-68f9.ts.net:8500"
|
||||
"ash3c.tailnet-68f9.ts.net:8500"
|
||||
)
|
||||
|
||||
TRAEFIK_IP="100.97.62.111"
|
||||
ALLOC_ID=$(nomad job allocs traefik-consul-lb | head -2 | tail -1 | awk '{print $1}')
|
||||
|
||||
SERVICE_DATA_LB="{
|
||||
\"ID\": \"traefik-consul-lb-${ALLOC_ID}\",
|
||||
\"Name\": \"consul-lb\",
|
||||
\"Tags\": [\"consul\", \"loadbalancer\", \"traefik\", \"multi-node\"],
|
||||
\"Address\": \"${TRAEFIK_IP}\",
|
||||
\"Port\": 80,
|
||||
\"Check\": {
|
||||
\"HTTP\": \"http://${TRAEFIK_IP}:80/\",
|
||||
\"Interval\": \"30s\",
|
||||
\"Timeout\": \"15s\"
|
||||
}
|
||||
}"
|
||||
|
||||
SERVICE_DATA_DASHBOARD="{
|
||||
\"ID\": \"traefik-dashboard-${ALLOC_ID}\",
|
||||
\"Name\": \"traefik-dashboard\",
|
||||
\"Tags\": [\"traefik\", \"dashboard\", \"multi-node\"],
|
||||
\"Address\": \"${TRAEFIK_IP}\",
|
||||
\"Port\": 8080,
|
||||
\"Check\": {
|
||||
\"HTTP\": \"http://${TRAEFIK_IP}:8080/api/overview\",
|
||||
\"Interval\": \"30s\",
|
||||
\"Timeout\": \"15s\"
|
||||
}
|
||||
}"
|
||||
|
||||
echo "Registering Traefik services to all Consul nodes..."
|
||||
echo "Allocation ID: ${ALLOC_ID}"
|
||||
echo "Traefik IP: ${TRAEFIK_IP}"
|
||||
|
||||
for node in "${CONSUL_NODES[@]}"; do
|
||||
echo "Registering to ${node}..."
|
||||
|
||||
# 注册 consul-lb 服务
|
||||
curl -s -X PUT "http://${node}/v1/agent/service/register" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "${SERVICE_DATA_LB}"
|
||||
|
||||
# 注册 traefik-dashboard 服务
|
||||
curl -s -X PUT "http://${node}/v1/agent/service/register" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "${SERVICE_DATA_DASHBOARD}"
|
||||
|
||||
echo "✓ Registered to ${node}"
|
||||
done
|
||||
|
||||
echo ""
|
||||
echo "🎉 Services registered to all Consul nodes!"
|
||||
echo ""
|
||||
echo "Verification:"
|
||||
for node in "${CONSUL_NODES[@]}"; do
|
||||
echo "Services on ${node}:"
|
||||
curl -s "http://${node}/v1/catalog/services" | jq -r 'keys[]' | grep -E "(consul-lb|traefik-dashboard)" | sed 's/^/ - /'
|
||||
done
|
||||
|
|
@ -1,61 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Consul配置生成脚本
|
||||
# 此脚本使用Consul模板从KV存储生成最终的Consul配置文件
|
||||
|
||||
set -e
|
||||
|
||||
# 配置参数
|
||||
CONSUL_ADDR="${CONSUL_ADDR:-localhost:8500}"
|
||||
ENVIRONMENT="${ENVIRONMENT:-dev}"
|
||||
CONSUL_CONFIG_DIR="${CONSUL_CONFIG_DIR:-/root/mgmt/components/consul/configs}"
|
||||
CONSUL_TEMPLATE_CMD="${CONSUL_TEMPLATE_CMD:-consul-template}"
|
||||
|
||||
echo "开始生成Consul配置文件..."
|
||||
echo "Consul地址: $CONSUL_ADDR"
|
||||
echo "环境: $ENVIRONMENT"
|
||||
echo "配置目录: $CONSUL_CONFIG_DIR"
|
||||
|
||||
# 检查Consul连接
|
||||
echo "检查Consul连接..."
|
||||
if ! curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then
|
||||
echo "错误: 无法连接到Consul服务器 $CONSUL_ADDR"
|
||||
exit 1
|
||||
fi
|
||||
echo "Consul连接成功"
|
||||
|
||||
# 检查consul-template是否可用
|
||||
if ! command -v $CONSUL_TEMPLATE_CMD &> /dev/null; then
|
||||
echo "错误: consul-template 命令不可用,请安装consul-template"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# 设置环境变量
|
||||
export CONSUL_ADDR
|
||||
export ENVIRONMENT
|
||||
|
||||
# 使用consul-template生成配置文件
|
||||
echo "使用consul-template生成配置文件..."
|
||||
$CONSUL_TEMPLATE_CMD \
|
||||
-template="$CONSUL_CONFIG_DIR/consul.hcl.tmpl:$CONSUL_CONFIG_DIR/consul.hcl" \
|
||||
-once \
|
||||
-consul-addr="$CONSUL_ADDR"
|
||||
|
||||
# 验证生成的配置文件
|
||||
if [ -f "$CONSUL_CONFIG_DIR/consul.hcl" ]; then
|
||||
echo "配置文件生成成功: $CONSUL_CONFIG_DIR/consul.hcl"
|
||||
|
||||
# 验证配置文件语法
|
||||
echo "验证配置文件语法..."
|
||||
if consul validate $CONSUL_CONFIG_DIR/consul.hcl; then
|
||||
echo "配置文件语法验证通过"
|
||||
else
|
||||
echo "错误: 配置文件语法验证失败"
|
||||
exit 1
|
||||
fi
|
||||
else
|
||||
echo "错误: 配置文件生成失败"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Consul配置文件生成完成"
|
||||
|
|
@ -1,104 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Consul变量配置脚本 - 遵循最佳命名规范
|
||||
# 此脚本将Consul集群配置存储到Consul KV中,遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
|
||||
|
||||
set -e
|
||||
|
||||
# 配置参数
|
||||
CONSUL_ADDR="${CONSUL_ADDR:-localhost:8500}"
|
||||
ENVIRONMENT="${ENVIRONMENT:-dev}"
|
||||
CONSUL_CONFIG_DIR="${CONSUL_CONFIG_DIR:-/root/mgmt/components/consul/configs}"
|
||||
|
||||
echo "开始配置Consul变量,遵循最佳命名规范..."
|
||||
echo "Consul地址: $CONSUL_ADDR"
|
||||
echo "环境: $ENVIRONMENT"
|
||||
|
||||
# 检查Consul连接
|
||||
echo "检查Consul连接..."
|
||||
if ! curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then
|
||||
echo "错误: 无法连接到Consul服务器 $CONSUL_ADDR"
|
||||
exit 1
|
||||
fi
|
||||
echo "Consul连接成功"
|
||||
|
||||
# 创建Consul集群配置变量
|
||||
echo "创建Consul集群配置变量..."
|
||||
|
||||
# 基础配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/data_dir" -d "/opt/consul/data"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/raft_dir" -d "/opt/consul/raft"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/datacenter" -d "dc1"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/bootstrap_expect" -d "3"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/log_level" -d "INFO"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/encrypt_key" -d "YourEncryptionKeyHere"
|
||||
|
||||
# UI配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ui/enabled" -d "true"
|
||||
|
||||
# 网络配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/network/client_addr" -d "0.0.0.0"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/network/bind_interface" -d "eth0"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/network/advertise_interface" -d "eth0"
|
||||
|
||||
# 端口配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/dns" -d "8600"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/http" -d "8500"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/https" -d "-1"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/grpc" -d "8502"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/grpc_tls" -d "8503"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/serf_lan" -d "8301"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/serf_wan" -d "8302"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/server" -d "8300"
|
||||
|
||||
# 节点配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/master/ip" -d "100.117.106.136"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/ash3c/ip" -d "100.116.80.94"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/warden/ip" -d "100.122.197.112"
|
||||
|
||||
# 服务发现配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/service/enable_script_checks" -d "true"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/service/enable_local_script_checks" -d "true"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/service/enable_service_script" -d "true"
|
||||
|
||||
# 性能配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/performance/raft_multiplier" -d "1"
|
||||
|
||||
# 日志配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/log/enable_syslog" -d "false"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/log/log_file" -d "/var/log/consul/consul.log"
|
||||
|
||||
# 连接配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/connection/reconnect_timeout" -d "30s"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/connection/reconnect_timeout_wan" -d "30s"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/connection/session_ttl_min" -d "10s"
|
||||
|
||||
# Autopilot配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/cleanup_dead_servers" -d "true"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/last_contact_threshold" -d "200ms"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/max_trailing_logs" -d "250"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/server_stabilization_time" -d "10s"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/disable_upgrade_migration" -d "false"
|
||||
# 添加领导者优先级配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/redundancy_zone_tag_master" -d "vice_president"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/redundancy_zone_tag_warden" -d "president"
|
||||
|
||||
# 快照配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/enabled" -d "true"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/interval" -d "24h"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/retain" -d "30"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/name" -d "consul-snapshot-{{.Timestamp}}"
|
||||
|
||||
# 备份配置
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/enabled" -d "true"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/interval" -d "6h"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/retain" -d "7"
|
||||
curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/name" -d "consul-backup-{{.Timestamp}}"
|
||||
|
||||
echo "Consul变量配置完成"
|
||||
|
||||
# 验证配置
|
||||
echo "验证配置..."
|
||||
curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/?keys" | jq -r '.[]' | head -10
|
||||
|
||||
echo "Consul变量配置脚本执行完成"
|
||||
|
|
@ -1,261 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# Consul 变量和存储配置脚本
|
||||
# 用于增强Consul集群功能
|
||||
|
||||
set -e
|
||||
|
||||
# 颜色输出
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# 日志函数
|
||||
log_info() {
|
||||
echo -e "${GREEN}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_warn() {
|
||||
echo -e "${YELLOW}[WARN]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
# 默认Consul地址
|
||||
CONSUL_ADDR=${CONSUL_ADDR:-"http://localhost:8500"}
|
||||
|
||||
# 检查Consul连接
|
||||
check_consul() {
|
||||
log_info "检查Consul连接..."
|
||||
if curl -s "${CONSUL_ADDR}/v1/status/leader" > /dev/null; then
|
||||
log_info "Consul连接正常"
|
||||
return 0
|
||||
else
|
||||
log_error "无法连接到Consul: ${CONSUL_ADDR}"
|
||||
return 1
|
||||
fi
|
||||
}
|
||||
|
||||
# 配置Consul变量
|
||||
setup_variables() {
|
||||
log_info "配置Consul变量..."
|
||||
|
||||
# 环境变量
|
||||
ENVIRONMENT=${ENVIRONMENT:-"dev"}
|
||||
|
||||
# 创建基础配置结构
|
||||
log_info "创建基础配置结构..."
|
||||
|
||||
# 应用配置
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/app/name" -d "my-application" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/app/version" -d "1.0.0" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/app/environment" -d "${ENVIRONMENT}" > /dev/null
|
||||
|
||||
# 数据库配置
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/database/host" -d "db.example.com" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/database/port" -d "5432" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/database/name" -d "myapp_db" > /dev/null
|
||||
|
||||
# 缓存配置
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/cache/host" -d "redis.example.com" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/cache/port" -d "6379" > /dev/null
|
||||
|
||||
# 消息队列配置
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/mq/host" -d "mq.example.com" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/mq/port" -d "5672" > /dev/null
|
||||
|
||||
# 特性开关
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/features/new_ui" -d "true" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/features/advanced_analytics" -d "false" > /dev/null
|
||||
|
||||
log_info "Consul变量配置完成"
|
||||
}
|
||||
|
||||
# 配置Consul存储
|
||||
setup_storage() {
|
||||
log_info "配置Consul存储..."
|
||||
|
||||
# 创建存储配置
|
||||
# 注意:这些配置需要在Consul配置文件中启用相应的存储后端
|
||||
|
||||
# 持久化存储配置
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/data_dir" -d "/opt/consul/data" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/raft_dir" -d "/opt/consul/raft" > /dev/null
|
||||
|
||||
# 快照配置
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/snapshot_enabled" -d "true" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/snapshot_interval" -d "24h" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/snapshot_retention" -d "30" > /dev/null
|
||||
|
||||
# 备份配置
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/backup_enabled" -d "true" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/backup_interval" -d "6h" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/backup_retention" -d "7" > /dev/null
|
||||
|
||||
# 自动清理配置
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/cleanup_dead_servers" -d "true" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/last_contact_threshold" -d "200ms" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/max_trailing_logs" -d "250" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/server_stabilization_time" -d "10s" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/redundancy_zone_tag" -d "" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/disable_upgrade_migration" -d "false" > /dev/null
|
||||
curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/upgrade_version_tag" -d "" > /dev/null
|
||||
|
||||
log_info "Consul存储配置完成"
|
||||
}
|
||||
|
||||
# 创建Consul配置文件
|
||||
create_consul_config() {
|
||||
log_info "创建Consul配置文件..."
|
||||
|
||||
# 创建配置目录
|
||||
mkdir -p /root/mgmt/components/consul/configs
|
||||
|
||||
# 创建基础配置文件
|
||||
cat > /root/mgmt/components/consul/configs/consul.hcl << EOF
|
||||
# Consul 基础配置
|
||||
data_dir = "/opt/consul/data"
|
||||
raft_dir = "/opt/consul/raft"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "dc1"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = 3
|
||||
|
||||
# 客户端地址
|
||||
client_addr = "0.0.0.0"
|
||||
|
||||
# 绑定地址
|
||||
bind_addr = "{{ GetInterfaceIP `eth0` }}"
|
||||
|
||||
# 广告地址
|
||||
advertise_addr = "{{ GetInterfaceIP `eth0` }}"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = 8600
|
||||
http = 8500
|
||||
https = -1
|
||||
grpc = 8502
|
||||
grpc_tls = 8503
|
||||
serf_lan = 8301
|
||||
serf_wan = 8302
|
||||
server = 8300
|
||||
}
|
||||
|
||||
# 连接其他节点
|
||||
retry_join = ["100.117.106.136", "100.116.80.94", "100.122.197.112"]
|
||||
|
||||
# 启用服务发现
|
||||
enable_service_script = true
|
||||
|
||||
# 启用脚本检查
|
||||
enable_script_checks = true
|
||||
|
||||
# 启用本地脚本检查
|
||||
enable_local_script_checks = true
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = 1
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "INFO"
|
||||
enable_syslog = false
|
||||
log_file = "/var/log/consul/consul.log"
|
||||
|
||||
# 自动加密
|
||||
encrypt = "YourEncryptionKeyHere"
|
||||
|
||||
# 重用端口
|
||||
reconnect_timeout = "30s"
|
||||
reconnect_timeout_wan = "30s"
|
||||
|
||||
# 会话TTL
|
||||
session_ttl_min = "10s"
|
||||
|
||||
# 自动清理
|
||||
autopilot {
|
||||
cleanup_dead_servers = true
|
||||
last_contact_threshold = "200ms"
|
||||
max_trailing_logs = 250
|
||||
server_stabilization_time = "10s"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = false
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = true
|
||||
interval = "24h"
|
||||
retain = 30
|
||||
name = "consul-snapshot-{{.Timestamp}}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = true
|
||||
interval = "6h"
|
||||
retain = 7
|
||||
name = "consul-backup-{{.Timestamp}}"
|
||||
}
|
||||
EOF
|
||||
|
||||
log_info "Consul配置文件创建完成: /root/mgmt/components/consul/configs/consul.hcl"
|
||||
}
|
||||
|
||||
# 显示配置
|
||||
show_config() {
|
||||
log_info "显示Consul变量配置..."
|
||||
echo "=========================================="
|
||||
curl -s "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT:-dev}/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"'
|
||||
echo "=========================================="
|
||||
|
||||
log_info "显示Consul存储配置..."
|
||||
echo "=========================================="
|
||||
curl -s "${CONSUL_ADDR}/v1/kv/storage/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"'
|
||||
echo "=========================================="
|
||||
}
|
||||
|
||||
# 主函数
|
||||
main() {
|
||||
log_info "开始配置Consul变量和存储..."
|
||||
|
||||
# 检查Consul连接
|
||||
check_consul
|
||||
|
||||
# 配置变量
|
||||
setup_variables
|
||||
|
||||
# 配置存储
|
||||
setup_storage
|
||||
|
||||
# 创建配置文件
|
||||
create_consul_config
|
||||
|
||||
# 显示配置
|
||||
show_config
|
||||
|
||||
log_info "Consul变量和存储配置完成"
|
||||
|
||||
# 提示下一步
|
||||
log_info "下一步操作:"
|
||||
log_info "1. 重启Consul服务以应用新配置"
|
||||
log_info "2. 验证配置是否生效"
|
||||
log_info "3. 根据需要调整配置参数"
|
||||
}
|
||||
|
||||
# 执行主函数
|
||||
main "$@"
|
||||
|
|
@ -1,149 +0,0 @@
|
|||
#!/bin/bash
|
||||
|
||||
# 环境设置脚本
|
||||
# 用于设置开发环境的必要组件和依赖
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
# 颜色定义
|
||||
RED='\033[0;31m'
|
||||
GREEN='\033[0;32m'
|
||||
YELLOW='\033[1;33m'
|
||||
BLUE='\033[0;34m'
|
||||
NC='\033[0m' # No Color
|
||||
|
||||
# 日志函数
|
||||
log_info() {
|
||||
echo -e "${BLUE}[INFO]${NC} $1"
|
||||
}
|
||||
|
||||
log_success() {
|
||||
echo -e "${GREEN}[SUCCESS]${NC} $1"
|
||||
}
|
||||
|
||||
log_warning() {
|
||||
echo -e "${YELLOW}[WARNING]${NC} $1"
|
||||
}
|
||||
|
||||
log_error() {
|
||||
echo -e "${RED}[ERROR]${NC} $1"
|
||||
}
|
||||
|
||||
# 检查必要的工具
|
||||
check_dependencies() {
|
||||
log_info "检查系统依赖..."
|
||||
|
||||
local deps=("git" "curl" "wget" "jq" "docker" "podman")
|
||||
local missing_deps=()
|
||||
|
||||
for dep in "${deps[@]}"; do
|
||||
if ! command -v "$dep" &> /dev/null; then
|
||||
missing_deps+=("$dep")
|
||||
fi
|
||||
done
|
||||
|
||||
if [ ${#missing_deps[@]} -ne 0 ]; then
|
||||
log_warning "缺少以下依赖: ${missing_deps[*]}"
|
||||
log_info "请安装缺少的依赖后重新运行"
|
||||
return 1
|
||||
fi
|
||||
|
||||
log_success "所有依赖检查通过"
|
||||
}
|
||||
|
||||
# 设置环境变量
|
||||
setup_environment_variables() {
|
||||
log_info "设置环境变量..."
|
||||
|
||||
# 创建环境变量文件
|
||||
cat > .env << EOF
|
||||
# 项目环境变量
|
||||
PROJECT_ROOT=$(pwd)
|
||||
SCRIPTS_DIR=\${PROJECT_ROOT}/scripts
|
||||
|
||||
# Vault 配置
|
||||
VAULT_ADDR=http://127.0.0.1:8200
|
||||
VAULT_DEV_ROOT_TOKEN_ID=myroot
|
||||
|
||||
# Consul 配置
|
||||
CONSUL_HTTP_ADDR=http://127.0.0.1:8500
|
||||
|
||||
# Nomad 配置
|
||||
NOMAD_ADDR=http://127.0.0.1:4646
|
||||
|
||||
# MCP 配置
|
||||
MCP_SERVER_PORT=3000
|
||||
EOF
|
||||
|
||||
log_success "环境变量文件已创建: .env"
|
||||
}
|
||||
|
||||
# 创建必要的目录
|
||||
create_directories() {
|
||||
log_info "创建必要的目录..."
|
||||
|
||||
local dirs=(
|
||||
"logs"
|
||||
"tmp"
|
||||
"data"
|
||||
"backups/vault"
|
||||
"backups/consul"
|
||||
"backups/nomad"
|
||||
)
|
||||
|
||||
for dir in "${dirs[@]}"; do
|
||||
mkdir -p "$dir"
|
||||
log_info "创建目录: $dir"
|
||||
done
|
||||
|
||||
log_success "目录创建完成"
|
||||
}
|
||||
|
||||
# 设置脚本权限
|
||||
setup_script_permissions() {
|
||||
log_info "设置脚本执行权限..."
|
||||
|
||||
find scripts/ -name "*.sh" -exec chmod +x {} \;
|
||||
|
||||
log_success "脚本权限设置完成"
|
||||
}
|
||||
|
||||
# 初始化 Git hooks(如果需要)
|
||||
setup_git_hooks() {
|
||||
log_info "设置 Git hooks..."
|
||||
|
||||
if [ -d ".git" ]; then
|
||||
# 创建 pre-commit hook
|
||||
cat > .git/hooks/pre-commit << 'EOF'
|
||||
#!/bin/bash
|
||||
# 运行基本的代码检查
|
||||
echo "运行 pre-commit 检查..."
|
||||
|
||||
# 检查脚本语法
|
||||
find scripts/ -name "*.sh" -exec bash -n {} \; || exit 1
|
||||
|
||||
echo "Pre-commit 检查通过"
|
||||
EOF
|
||||
chmod +x .git/hooks/pre-commit
|
||||
log_success "Git hooks 设置完成"
|
||||
else
|
||||
log_warning "不是 Git 仓库,跳过 Git hooks 设置"
|
||||
fi
|
||||
}
|
||||
|
||||
# 主函数
|
||||
main() {
|
||||
log_info "开始环境设置..."
|
||||
|
||||
check_dependencies || exit 1
|
||||
setup_environment_variables
|
||||
create_directories
|
||||
setup_script_permissions
|
||||
setup_git_hooks
|
||||
|
||||
log_success "环境设置完成!"
|
||||
log_info "请运行 'source .env' 来加载环境变量"
|
||||
}
|
||||
|
||||
# 执行主函数
|
||||
main "$@"
|
||||
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue