diff --git a/.kilocode/mcp.json.backup b/.kilocode/mcp.json.backup deleted file mode 120000 index 413e870..0000000 --- a/.kilocode/mcp.json.backup +++ /dev/null @@ -1 +0,0 @@ -/mnt/fnsync/mcp/mcp_shared_config.json \ No newline at end of file diff --git a/MCP_CONFIGURATION_SHARE.md b/MCP_CONFIGURATION_SHARE.md deleted file mode 100644 index 828f12a..0000000 --- a/MCP_CONFIGURATION_SHARE.md +++ /dev/null @@ -1,41 +0,0 @@ -# MCP 配置共享方案 - -本项目实现了跨主机多个IDE之间共享MCP(Model Context Protocol)配置的解决方案,使用NFS卷实现跨主机同步。 - -## 配置结构 - -- `/root/.mcp/mcp_settings.json` - 主MCP配置文件(符号链接指向NFS卷) -- `/mnt/fnsync/mcp/mcp_shared_config.json` - NFS卷上的统一配置文件(权威源) -- `mcp_shared_config.json` - 指向NFS卷上配置文件的符号链接 -- `sync_mcp_config.sh` - 同步脚本,用于将统一配置复制到各个IDE -- `sync_all_mcp_configs.sh` - 完整同步脚本,同步到所有可能的IDE和AI助手 -- `.kilocode/mcp.json` - 指向共享配置的符号链接 -- 其他IDE和AI助手的配置文件 - -## 统一配置内容 - -合并了以下MCP服务器: - -### 标准服务器 -- context7: 提供库文档和代码示例 -- filesystem: 文件系统访问 -- sequentialthinking: 顺序思考工具 -- git: Git 操作 -- time: 时间相关操作 -- memory: 知识图谱和记忆管理 -- tavily: 网络搜索功能 - -## 使用方法 - -1. **更新配置**: 编辑 `/mnt/fnsync/mcp/mcp_shared_config.json` 文件以修改MCP服务器配置(或通过符号链接 `/root/.mcp/mcp_settings.json`) -2. **同步配置**: - - 运行 `./sync_mcp_config.sh` 同步到特定IDE - - 运行 `./sync_all_mcp_configs.sh` 同步到所有IDE和AI助手 -3. **验证配置**: 确认各IDE中的MCP功能正常工作 - -## 维护说明 - -- 所有MCP配置更改都应在 `/mnt/fnsync/mcp/mcp_shared_config.json` 中进行(这是权威源) -- `/root/.mcp/mcp_settings.json` 现在是符号链接,指向NFS卷上的统一配置 -- 由于使用NFS卷,配置更改会自动跨主机共享 -- 如果添加新的IDE,可以将其配置文件链接到或复制自 `/mnt/fnsync/mcp/mcp_shared_config.json` \ No newline at end of file diff --git a/README.md.backup b/README.md.backup deleted file mode 100644 index e19f39f..0000000 --- a/README.md.backup +++ /dev/null @@ -1,572 +0,0 @@ -# 🏗️ 基础设施管理项目 - -这是一个现代化的多云基础设施管理平台,专注于 OpenTofu、Ansible 和 Nomad + Podman 的集成管理。 - -## 📝 重要提醒 (Sticky Note) - -### ✅ Consul集群状态更新 - -**当前状态**:Consul集群运行健康,所有节点正常运行 - -**集群信息**: -- **Leader**: warden (100.122.197.112:8300) -- **节点数量**: 3个服务器节点 -- **健康状态**: 所有节点健康检查通过 -- **节点列表**: - - master (100.117.106.136) - 韩国主节点 - - ash3c (100.116.80.94) - 美国服务器节点 - - warden (100.122.197.112) - 北京服务器节点,当前集群leader - -**配置状态**: -- Ansible inventory配置与实际集群状态一致 -- 所有节点均为服务器模式 -- bootstrap_expect=3,符合实际节点数量 - -**依赖关系**: -- Tailscale (第1天) ✅ -- Ansible (第2天) ✅ -- Nomad (第3天) ✅ -- Consul (第4天) ✅ **已完成** -- Terraform (第5天) ✅ **进展良好** -- Vault (第6天) ⏳ 计划中 -- Waypoint (第7天) ⏳ 计划中 - -**下一步计划**: -- 继续推进Terraform状态管理 -- 准备Vault密钥管理集成 -- 规划Waypoint应用部署流程 - ---- - -## 🎯 项目特性 - -- **🌩️ 多云支持**: Oracle Cloud, 华为云, Google Cloud, AWS, DigitalOcean -- **🏗️ 基础设施即代码**: 使用 OpenTofu 管理云资源 -- **⚙️ 配置管理**: 使用 Ansible 自动化配置和部署 -- **🐳 容器编排**: Nomad 集群管理和 Podman 容器运行时 -- **🔄 CI/CD**: Gitea Actions 自动化流水线 -- **📊 监控**: Prometheus + Grafana 监控体系 -- **🔐 安全**: 多层安全防护和合规性 - -## 🔄 架构分层与职责划分 - -### ⚠️ 重要:Terraform 与 Nomad 的职责区分 - -本项目采用分层架构,明确区分了不同工具的职责范围,避免混淆: - -#### 1. **Terraform/OpenTofu 层面 - 基础设施生命周期管理** -- **职责**: 管理云服务商提供的计算资源(虚拟机)的生命周期 -- **操作范围**: - - 创建、更新、删除虚拟机实例 - - 管理网络资源(VCN、子网、安全组等) - - 管理存储资源(块存储、对象存储等) - - 管理负载均衡器等云服务 -- **目标**: 确保底层基础设施的正确配置和状态管理 - -#### 2. **Nomad 层面 - 应用资源调度与编排** -- **职责**: 在已经运行起来的虚拟机内部进行资源分配和应用编排 -- **操作范围**: - - 在现有虚拟机上调度和运行容器化应用 - - 管理应用的生命周期(启动、停止、更新) - - 资源分配和限制(CPU、内存、存储) - - 服务发现和负载均衡 -- **目标**: 在已有基础设施上高效运行应用服务 - -#### 3. **关键区别** -- **Terraform** 关注的是**虚拟机本身**的生命周期管理 -- **Nomad** 关注的是**在虚拟机内部**运行的应用的资源调度 -- **Terraform** 决定"有哪些虚拟机" -- **Nomad** 决定"虚拟机上运行什么应用" - -#### 4. **工作流程示例** -``` -1. Terraform 创建虚拟机 (云服务商层面) - ↓ -2. 虚拟机启动并运行操作系统 - ↓ -3. 在虚拟机上安装和配置 Nomad 客户端 - ↓ -4. Nomad 在虚拟机上调度和运行应用容器 -``` - -**重要提醒**: 这两个层面不可混淆,Terraform 不应该管理应用层面的资源,Nomad 也不应该创建虚拟机。严格遵守此分层架构是项目成功的关键。 - -## 📁 项目结构 - -``` -mgmt/ -├── .gitea/workflows/ # CI/CD 工作流 -├── tofu/ # OpenTofu 基础设施代码 (基础设施生命周期管理) -│ ├── environments/ # 环境配置 (dev/staging/prod) -│ ├── modules/ # 可复用模块 -│ ├── providers/ # 云服务商配置 -│ └── shared/ # 共享配置 -├── configuration/ # Ansible 配置管理 -│ ├── inventories/ # 主机清单 -│ ├── playbooks/ # 剧本 -│ ├── templates/ # 模板文件 -│ └── group_vars/ # 组变量 -├── jobs/ # Nomad 作业定义 (应用资源调度与编排) -│ ├── consul/ # Consul 集群配置 -│ └── podman/ # Podman 相关作业 -├── configs/ # 配置文件 -│ ├── nomad-master.hcl # Nomad 主节点配置 -│ └── nomad-ash3c.hcl # Nomad 客户端配置 -├── docs/ # 文档 -├── security/ # 安全配置 -│ ├── certificates/ # 证书文件 -│ └── policies/ # 安全策略 -├── tests/ # 测试脚本和报告 -│ ├── mcp_servers/ # MCP服务器测试脚本 -│ ├── mcp_server_test_report.md # MCP服务器测试报告 -│ └── legacy/ # 旧的测试脚本 -├── tools/ # 工具和实用程序 -├── playbooks/ # 核心Ansible剧本 -└── Makefile # 项目管理命令 -``` - -**架构分层说明**: -- **tofu/** 目录包含 Terraform/OpenTofu 代码,负责管理云服务商提供的计算资源生命周期 -- **jobs/** 目录包含 Nomad 作业定义,负责在已有虚拟机内部进行应用资源调度 -- 这两个目录严格分离,确保职责边界清晰 - -**注意:** 项目已从 Docker Swarm 迁移到 Nomad + Podman,原有的 swarm 目录已不再使用。所有中间过程脚本和测试文件已清理,保留核心配置文件以符合GitOps原则。 - -## 🔄 GitOps 原则 - -本项目遵循 GitOps 工作流,确保基础设施状态与 Git 仓库中的代码保持一致: - -- **声明式配置**: 所有基础设施和应用程序配置都以声明式方式存储在 Git 中 -- **版本控制和审计**: 所有变更都通过 Git 提交,提供完整的变更历史和审计跟踪 -- **自动化同步**: 通过 CI/CD 流水线自动将 Git 中的变更应用到实际环境 -- **状态收敛**: 系统会持续监控实际状态,并自动修复任何与期望状态的偏差 - -### GitOps 工作流程 - -1. **声明期望状态**: 在 Git 中定义基础设施和应用程序的期望状态 -2. **提交变更**: 通过 Git 提交来应用变更 -3. **自动同步**: CI/CD 系统检测到变更并自动应用到环境 -4. **状态验证**: 系统验证实际状态与期望状态一致 -5. **监控和告警**: 持续监控状态并在出现偏差时发出告警 - -这种工作流确保了环境的一致性、可重复性和可靠性,同时提供了完整的变更历史和回滚能力。 - -## 🚀 快速开始 - -### 1. 环境准备 - -```bash -# 克隆项目 -git clone -cd mgmt - -# 检查环境状态 -./mgmt.sh status - -# 快速部署(适用于开发环境) -./mgmt.sh deploy -``` - -### 2. 配置云服务商 - -```bash -# 复制配置模板 -cp tofu/environments/dev/terraform.tfvars.example tofu/environments/dev/terraform.tfvars - -# 编辑配置文件,填入你的云服务商凭据 -vim tofu/environments/dev/terraform.tfvars -``` - -### 3. 初始化基础设施 - -```bash -# 初始化 OpenTofu -./mgmt.sh tofu init - -# 查看执行计划 -./mgmt.sh tofu plan - -# 应用基础设施变更 -cd tofu/environments/dev && tofu apply -``` - -### 4. 部署 Nomad 服务 - -```bash -# 部署 Consul 集群 -nomad run /root/mgmt/jobs/consul/consul-cluster.nomad - -# 查看 Nomad 任务 -nomad job status - -# 查看节点状态 -nomad node status -``` - -### ⚠️ 重要提示:网络访问注意事项 - -**Tailscale 网络访问**: -- 本项目中的 Nomad 和 Consul 服务通过 Tailscale 网络进行访问 -- 访问 Nomad (端口 4646) 和 Consul (端口 8500) 时,必须使用 Tailscale 分配的 IP 地址 -- 错误示例:`http://127.0.0.1:4646` 或 `http://localhost:8500` (无法连接) -- 正确示例:`http://100.x.x.x:4646` 或 `http://100.x.x.x:8500` (使用 Tailscale IP) - -**获取 Tailscale IP**: -```bash -# 查看当前节点的 Tailscale IP -tailscale ip -4 - -# 查看所有 Tailscale 网络中的节点 -tailscale status -``` - -**常见问题**: -- 如果遇到 "connection refused" 错误,请确认是否使用了正确的 Tailscale IP -- 确保 Tailscale 服务已启动并正常运行 -- 检查网络策略是否允许通过 Tailscale 接口访问相关端口 -- 更多详细经验和解决方案,请参考:[Consul 和 Nomad 访问问题经验教训](.gitea/issues/consul-nomad-access-lesson.md) - -### 🔄 Nomad 集群领导者轮换与访问策略 - -**Nomad 集群领导者机制**: -- Nomad 使用 Raft 协议实现分布式一致性,集群中只有一个领导者节点 -- 领导者节点负责处理所有写入操作和协调集群状态 -- 当领导者节点故障时,集群会自动选举新的领导者 - -**领导者轮换时的访问策略**: - -1. **动态发现领导者**: -```bash -# 查询当前领导者节点 -curl -s http://<任意Nomad服务器IP>:4646/v1/status/leader -# 返回结果示例: "100.90.159.68:4647" - -# 使用返回的领导者地址进行API调用 -curl -s http://100.90.159.68:4646/v1/nodes -``` - -2. **负载均衡方案**: - - **DNS 负载均衡**:使用 Consul DNS 服务,通过 `nomad.service.consul` 解析到当前领导者 - - **代理层负载均衡**:在 Nginx/HAProxy 配置中添加健康检查,自动路由到活跃的领导者节点 - - **客户端重试机制**:在客户端代码中实现重试逻辑,当连接失败时尝试其他服务器节点 - -3. **推荐访问模式**: -```bash -# 使用领导者发现脚本 -#!/bin/bash -# 获取任意一个Nomad服务器IP -SERVER_IP="100.116.158.95" -# 查询当前领导者 -LEADER=$(curl -s http://${SERVER_IP}:4646/v1/status/leader | sed 's/"//g') -# 使用领导者地址执行命令 -nomad node status -address=http://${LEADER} -``` - -4. **高可用性配置**: - - 将所有 Nomad 服务器节点添加到客户端配置中 - - 客户端会自动连接到可用的服务器节点 - - 对于写入操作,客户端会自动重定向到领导者节点 - -**注意事项**: -- Nomad 集群领导者轮换是自动进行的,通常不需要人工干预 -- 在领导者选举期间,集群可能会短暂无法处理写入操作 -- 建议在应用程序中实现适当的重试逻辑,以处理领导者切换期间的临时故障 - -## 🛠️ 常用命令 - -| 命令 | 描述 | -|------|------| -| `make status` | 显示项目状态总览 | -| `make deploy` | 快速部署所有服务 | -| `make cleanup` | 清理所有部署的服务 | -| `cd tofu/environments/dev && tofu ` | OpenTofu 管理命令 | -| `nomad job status` | 查看 Nomad 任务状态 | -| `nomad node status` | 查看 Nomad 节点状态 | -| `podman ps` | 查看运行中的容器 | -| `ansible-playbook playbooks/configure-nomad-clients.yml` | 配置 Nomad 客户端 | -| `./run_tests.sh` 或 `make test-mcp` | 运行所有MCP服务器测试 | -| `make test-kali` | 运行Kali Linux快速健康检查 | -| `make test-kali-security` | 运行Kali Linux安全工具测试 | -| `make test-kali-full` | 运行Kali Linux完整测试套件 | - -## 🌩️ 支持的云服务商 - -### Oracle Cloud Infrastructure (OCI) -- ✅ 计算实例 -- ✅ 网络配置 (VCN, 子网, 安全组) -- ✅ 存储 (块存储, 对象存储) -- ✅ 负载均衡器 - -### 华为云 -- ✅ 弹性云服务器 (ECS) -- ✅ 虚拟私有云 (VPC) -- ✅ 弹性负载均衡 (ELB) -- ✅ 云硬盘 (EVS) - -### Google Cloud Platform -- ✅ Compute Engine -- ✅ VPC 网络 -- ✅ Cloud Load Balancing -- ✅ Persistent Disk - -### Amazon Web Services -- ✅ EC2 实例 -- ✅ VPC 网络 -- ✅ Application Load Balancer -- ✅ EBS 存储 - -### DigitalOcean -- ✅ Droplets -- ✅ VPC 网络 -- ✅ Load Balancers -- ✅ Block Storage - -## 🔄 CI/CD 流程 - -### 基础设施部署流程 -1. **代码提交** → 触发 Gitea Actions -2. **OpenTofu Plan** → 生成执行计划 -3. **人工审核** → 确认变更 -4. **OpenTofu Apply** → 应用基础设施变更 -5. **Ansible 部署** → 配置和部署应用 - -### 应用部署流程 -1. **应用代码更新** → 构建容器镜像 -2. **镜像推送** → 推送到镜像仓库 -3. **Nomad Job 更新** → 更新任务定义 -4. **Nomad 部署** → 滚动更新服务 -5. **健康检查** → 验证部署状态 - -## 📊 监控和可观测性 - -### 监控组件 -- **Prometheus**: 指标收集和存储 -- **Grafana**: 可视化仪表板 -- **AlertManager**: 告警管理 -- **Node Exporter**: 系统指标导出 - -### 日志管理 -- **ELK Stack**: Elasticsearch + Logstash + Kibana -- **Fluentd**: 日志收集和转发 -- **结构化日志**: JSON 格式标准化 - -## 🔐 安全最佳实践 - -### 基础设施安全 -- **网络隔离**: VPC, 安全组, 防火墙 -- **访问控制**: IAM 角色和策略 -- **数据加密**: 传输和静态加密 -- **密钥管理**: 云服务商密钥管理服务 - -### 应用安全 -- **容器安全**: 镜像扫描, 最小权限 -- **网络安全**: 服务网格, TLS 终止 -- **秘密管理**: Docker Secrets, Ansible Vault -- **安全审计**: 日志监控和审计 - -## 🧪 测试策略 - -### 基础设施测试 -- **语法检查**: OpenTofu validate -- **安全扫描**: Checkov, tfsec -- **合规检查**: OPA (Open Policy Agent) - -### 应用测试 -- **单元测试**: 应用代码测试 -- **集成测试**: 服务间集成测试 -- **端到端测试**: 完整流程测试 - -### MCP服务器测试 -项目包含完整的MCP(Model Context Protocol)服务器测试套件,位于 `tests/mcp_servers/` 目录: - -- **context7服务器测试**: 验证初始化、工具列表和搜索功能 -- **qdrant服务器测试**: 测试文档添加、搜索和删除功能 -- **qdrant-ollama服务器测试**: 验证向量数据库与LLM集成功能 - -测试脚本包括Shell脚本和Python脚本,支持通过JSON-RPC协议直接测试MCP服务器功能。详细的测试结果和问题修复记录请参考 `tests/mcp_server_test_report.md`。 - -运行测试: -```bash -# 运行单个测试脚本 -cd tests/mcp_servers -./test_local_mcp_servers.sh - -# 或运行Python测试 -python test_mcp_servers_simple.py -``` - -### Kali Linux系统测试 -项目包含完整的Kali Linux系统测试套件,位于 `configuration/playbooks/test/` 目录。测试包括: - -1. **快速健康检查** (`kali-health-check.yml`): 基本系统状态检查 -2. **安全工具测试** (`kali-security-tools.yml`): 测试各种安全工具的安装和功能 -3. **完整系统测试** (`test-kali.yml`): 全面的系统测试和报告生成 -4. **完整测试套件** (`kali-full-test-suite.yml`): 按顺序执行所有测试 - -运行测试: -```bash -# Kali Linux快速健康检查 -make test-kali - -# Kali Linux安全工具测试 -make test-kali-security - -# Kali Linux完整测试套件 -make test-kali-full -``` - -## 📚 文档 - -- [Consul集群故障排除](docs/consul-cluster-troubleshooting.md) -- [磁盘管理](docs/disk-management.md) -- [Nomad NFS设置](docs/nomad-nfs-setup.md) -- [Consul-Terraform集成](docs/setup/consul-terraform-integration.md) -- [OCI凭据设置](docs/setup/oci-credentials-setup.md) -- [Oracle云设置](docs/setup/oracle-cloud-setup.md) - -## 🤝 贡献指南 - -1. Fork 项目 -2. 创建特性分支 (`git checkout -b feature/amazing-feature`) -3. 提交变更 (`git commit -m 'Add amazing feature'`) -4. 推送到分支 (`git push origin feature/amazing-feature`) -5. 创建 Pull Request - -## 📄 许可证 - -本项目采用 MIT 许可证 - 查看 [LICENSE](LICENSE) 文件了解详情。 - -## 🆘 支持 - -如果你遇到问题或有疑问: - -1. 查看 [文档](docs/) -2. 搜索 [Issues](../../issues) -3. 创建新的 [Issue](../../issues/new) - -## ⚠️ 重要经验教训 - -### Terraform 与 Nomad 职责区分 -**问题**:在基础设施管理中容易混淆 Terraform 和 Nomad 的职责范围,导致架构设计混乱。 - -**根本原因**:Terraform 和 Nomad 虽然都是基础设施管理工具,但它们在架构中处于不同层面,负责不同类型的资源管理。 - -**解决方案**: -1. **明确分层架构**: - - **Terraform/OpenTofu**:负责云服务商提供的计算资源(虚拟机)的生命周期管理 - - **Nomad**:负责在已有虚拟机内部进行应用资源调度和编排 - -2. **职责边界清晰**: - - Terraform 决定"有哪些虚拟机" - - Nomad 决定"虚拟机上运行什么应用" - - 两者不应越界管理对方的资源 - -3. **工作流程分离**: - ``` - 1. Terraform 创建虚拟机 (云服务商层面) - ↓ - 2. 虚拟机启动并运行操作系统 - ↓ - 3. 在虚拟机上安装和配置 Nomad 客户端 - ↓ - 4. Nomad 在虚拟机上调度和运行应用容器 - ``` - -**重要提醒**:严格遵守这种分层架构是项目成功的关键。任何混淆这两个层面职责的做法都会导致架构混乱和管理困难。 - -### Consul 和 Nomad 访问问题 -**问题**:尝试访问 Consul 服务时,使用 `http://localhost:8500` 或 `http://127.0.0.1:8500` 无法连接。 - -**根本原因**:本项目中的 Consul 和 Nomad 服务通过 Nomad + Podman 在集群中运行,并通过 Tailscale 网络进行访问。这些服务不在本地运行,因此无法通过 localhost 访问。 - -**解决方案**: -1. **使用 Tailscale IP**:必须使用 Tailscale 分配的 IP 地址访问服务 - ```bash - # 查看当前节点的 Tailscale IP - tailscale ip -4 - - # 查看所有 Tailscale 网络中的节点 - tailscale status - - # 访问 Consul (使用实际的 Tailscale IP) - curl http://100.x.x.x:8500/v1/status/leader - - # 访问 Nomad (使用实际的 Tailscale IP) - curl http://100.x.x.x:4646/v1/status/leader - ``` - -2. **服务发现**:Consul 集群由 3 个节点组成,Nomad 集群由十多个节点组成,需要正确识别服务运行的节点 - -3. **集群架构**: - - Consul 集群:3 个节点 (kr-master, us-ash3c, bj-warden) - - Nomad 集群:十多个节点,包括服务器节点和客户端节点 - -**重要提醒**:在开发和调试过程中,始终记住使用 Tailscale IP 而不是 localhost 访问集群服务。这是本项目架构的基本要求,必须严格遵守。 - -### Consul 集群配置管理经验 -**问题**:Consul集群配置文件与实际运行状态不一致,导致集群管理混乱和配置错误。 - -**根本原因**:Ansible inventory配置文件中的节点信息与实际Consul集群中的节点状态不匹配,包括节点角色、数量和expect值等关键配置。 - -**解决方案**: -1. **定期验证集群状态**:使用Consul API定期检查集群实际状态,确保配置文件与实际运行状态一致 - ```bash - # 查看Consul集群节点信息 - curl -s http://:8500/v1/catalog/nodes - - # 查看节点详细信息 - curl -s http://:8500/v1/agent/members - - # 查看集群leader信息 - curl -s http://:8500/v1/status/leader - ``` - -2. **保持配置文件一致性**:确保所有相关的inventory配置文件(如`csol-consul-nodes.ini`、`consul-nodes.ini`、`consul-cluster.ini`)保持一致,包括: - - 服务器节点列表和数量 - - 客户端节点列表和数量 - - `bootstrap_expect`值(必须与实际服务器节点数量匹配) - - 节点角色和IP地址 - -3. **正确识别节点角色**:通过API查询确认每个节点的实际角色,避免将服务器节点误配置为客户端节点,或反之 - ```json - // API返回的节点信息示例 - { - "Name": "warden", - "Addr": "100.122.197.112", - "Port": 8300, - "Status": 1, - "ProtocolVersion": 2, - "Delegate": 1, - "Server": true // 确认节点角色 - } - ``` - -4. **更新配置流程**:当发现配置与实际状态不匹配时,按照以下步骤更新: - - 使用API获取集群实际状态 - - 根据实际状态更新所有相关配置文件 - - 确保所有配置文件中的信息保持一致 - - 更新配置文件中的说明和注释,反映最新的集群状态 - -**实际案例**: -- **初始状态**:配置文件显示2个服务器节点和5个客户端节点,`bootstrap_expect=2` -- **实际状态**:Consul集群运行3个服务器节点(master、ash3c、warden),无客户端节点,`expect=3` -- **解决方案**:更新所有配置文件,将服务器节点数量改为3个,移除所有客户端节点配置,将`bootstrap_expect`值更新为3 - -**重要提醒**:Consul集群配置必须与实际运行状态保持严格一致。任何不匹配都可能导致集群不稳定或功能异常。定期使用Consul API验证集群状态,并及时更新配置文件,是确保集群稳定运行的关键。 - -## 🎉 致谢 - -感谢所有为这个项目做出贡献的开发者和社区成员! -## 脚本整理 - -项目脚本已重新整理,按功能分类存放在 `scripts/` 目录中: - -- `scripts/setup/` - 环境设置和初始化 -- `scripts/deployment/` - 部署相关脚本 -- `scripts/testing/` - 测试脚本 -- `scripts/utilities/` - 工具脚本 -- `scripts/mcp/` - MCP 服务器相关 -- `scripts/ci-cd/` - CI/CD 相关 - -详细信息请查看 [脚本索引](scripts/SCRIPT_INDEX.md)。 - diff --git a/ansible/consul-client-deployment.yml b/ansible/consul-client-deployment.yml new file mode 100644 index 0000000..a8f7261 --- /dev/null +++ b/ansible/consul-client-deployment.yml @@ -0,0 +1,104 @@ +--- +# Ansible Playbook: 部署 Consul Client 到所有 Nomad 节点 +- name: Deploy Consul Client to Nomad nodes + hosts: nomad_clients:nomad_servers + become: yes + vars: + consul_version: "1.21.5" + consul_datacenter: "dc1" + consul_servers: + - "100.117.106.136:8300" # master (韩国) + - "100.122.197.112:8300" # warden (北京) + - "100.116.80.94:8300" # ash3c (美国) + + tasks: + - name: Update APT cache + apt: + update_cache: yes + + - name: Install consul via APT (假设源已存在) + apt: + name: consul={{ consul_version }}-* + state: present + update_cache: yes + register: consul_installed + + - name: Create consul user (if not exists) + user: + name: consul + system: yes + shell: /bin/false + home: /opt/consul + create_home: yes + + - name: Create consul directories + file: + path: "{{ item }}" + state: directory + owner: consul + group: consul + mode: '0755' + loop: + - /opt/consul + - /opt/consul/data + - /etc/consul.d + - /var/log/consul + + - name: Get node Tailscale IP + shell: ip addr show tailscale0 | grep 'inet ' | awk '{print $2}' | cut -d'/' -f1 + register: tailscale_ip + failed_when: tailscale_ip.stdout == "" + + - name: Create consul client configuration + template: + src: templates/consul-client.hcl.j2 + dest: /etc/consul.d/consul.hcl + owner: consul + group: consul + mode: '0644' + notify: restart consul + + - name: Create consul systemd service + template: + src: templates/consul.service.j2 + dest: /etc/systemd/system/consul.service + owner: root + group: root + mode: '0644' + notify: reload systemd + + - name: Enable and start consul service + systemd: + name: consul + enabled: yes + state: started + notify: restart consul + + - name: Wait for consul to be ready + uri: + url: "http://{{ tailscale_ip.stdout }}:8500/v1/status/leader" + status_code: 200 + timeout: 5 + register: consul_leader_status + until: consul_leader_status.status == 200 + retries: 30 + delay: 5 + + - name: Verify consul cluster membership + shell: consul members -status=alive -format=json | jq -r '.[].Name' + register: consul_members + changed_when: false + + - name: Display cluster status + debug: + msg: "Node {{ inventory_hostname.split('.')[0] }} joined cluster with {{ consul_members.stdout_lines | length }} members" + + handlers: + - name: reload systemd + systemd: + daemon_reload: yes + + - name: restart consul + systemd: + name: consul + state: restarted \ No newline at end of file diff --git a/ansible/inventory/hosts.yml b/ansible/inventory/hosts.yml new file mode 100644 index 0000000..2b31d4f --- /dev/null +++ b/ansible/inventory/hosts.yml @@ -0,0 +1,59 @@ +--- +# Ansible Inventory for Consul Client Deployment +all: + children: + consul_servers: + hosts: + master.tailnet-68f9.ts.net: + ansible_host: 100.117.106.136 + region: korea + warden.tailnet-68f9.ts.net: + ansible_host: 100.122.197.112 + region: beijing + ash3c.tailnet-68f9.ts.net: + ansible_host: 100.116.80.94 + region: usa + + nomad_servers: + hosts: + # Nomad Server 节点也需要 Consul Client + semaphore.tailnet-68f9.ts.net: + ansible_host: 100.116.158.95 + region: korea + ch3.tailnet-68f9.ts.net: + ansible_host: 100.86.141.112 + region: switzerland + ash1d.tailnet-68f9.ts.net: + ansible_host: 100.81.26.3 + region: usa + ash2e.tailnet-68f9.ts.net: + ansible_host: 100.103.147.94 + region: usa + ch2.tailnet-68f9.ts.net: + ansible_host: 100.90.159.68 + region: switzerland + de.tailnet-68f9.ts.net: + ansible_host: 100.120.225.29 + region: germany + onecloud1.tailnet-68f9.ts.net: + ansible_host: 100.98.209.50 + region: unknown + + nomad_clients: + hosts: + # 需要部署 Consul Client 的节点 + influxdb1.tailnet-68f9.ts.net: + ansible_host: "{{ influxdb1_ip }}" # 需要填入实际IP + region: beijing + browser.tailnet-68f9.ts.net: + ansible_host: "{{ browser_ip }}" # 需要填入实际IP + region: beijing + # hcp1 已经有 Consul Client,可选择重新配置 + # hcp1.tailnet-68f9.ts.net: + # ansible_host: 100.97.62.111 + # region: beijing + + vars: + ansible_user: root + ansible_ssh_private_key_file: ~/.ssh/id_rsa + consul_datacenter: dc1 diff --git a/ansible/templates/consul-client.hcl.j2 b/ansible/templates/consul-client.hcl.j2 new file mode 100644 index 0000000..3023dde --- /dev/null +++ b/ansible/templates/consul-client.hcl.j2 @@ -0,0 +1,61 @@ +# Consul Client Configuration for {{ inventory_hostname }} +datacenter = "{{ consul_datacenter }}" +data_dir = "/opt/consul/data" +log_level = "INFO" +node_name = "{{ inventory_hostname.split('.')[0] }}" +bind_addr = "{{ tailscale_ip.stdout }}" + +# Client mode (not server) +server = false + +# Connect to Consul servers (指向三节点集群) +retry_join = [ + "100.117.106.136", # master (韩国) + "100.122.197.112", # warden (北京) + "100.116.80.94" # ash3c (美国) +] + +# Performance optimization +performance { + raft_multiplier = 5 +} + +# Ports configuration +ports { + grpc = 8502 + http = 8500 + dns = 8600 +} + +# Enable Connect for service mesh +connect { + enabled = true +} + +# Cache configuration for performance +cache { + entry_fetch_max_burst = 42 + entry_fetch_rate = 30 +} + +# Node metadata +node_meta = { + region = "{{ region | default('unknown') }}" + zone = "nomad-server" +} + +# UI disabled for clients +ui_config { + enabled = false +} + +# ACL configuration (if needed) +acl = { + enabled = false + default_policy = "allow" +} + +# Logging +log_file = "/var/log/consul/consul.log" +log_rotate_duration = "24h" +log_rotate_max_files = 7 diff --git a/ansible/templates/consul.service.j2 b/ansible/templates/consul.service.j2 new file mode 100644 index 0000000..2b941e1 --- /dev/null +++ b/ansible/templates/consul.service.j2 @@ -0,0 +1,26 @@ +[Unit] +Description=Consul Client +Documentation=https://www.consul.io/ +Requires=network-online.target +After=network-online.target +ConditionFileNotEmpty=/etc/consul.d/consul.hcl + +[Service] +Type=notify +User=consul +Group=consul +ExecStart=/usr/bin/consul agent -config-dir=/etc/consul.d +ExecReload=/bin/kill -HUP $MAINPID +KillMode=process +Restart=on-failure +LimitNOFILE=65536 + +# Security settings +NoNewPrivileges=yes +PrivateTmp=yes +ProtectHome=yes +ProtectSystem=strict +ReadWritePaths=/opt/consul /var/log/consul + +[Install] +WantedBy=multi-user.target diff --git a/components/consul/README.md b/components/consul/README.md new file mode 100644 index 0000000..41ca032 --- /dev/null +++ b/components/consul/README.md @@ -0,0 +1,19 @@ +# Consul 配置 + +## 部署 + +```bash +nomad job run components/consul/jobs/consul-cluster.nomad +``` + +## Job 信息 + +- **Job 名称**: `consul-cluster-nomad` +- **类型**: service +- **节点**: master, ash3c, warden + +## 访问方式 + +- Master: `http://master.tailnet-68f9.ts.net:8500` +- Ash3c: `http://ash3c.tailnet-68f9.ts.net:8500` +- Warden: `http://warden.tailnet-68f9.ts.net:8500` diff --git a/components/consul/jobs/consul-cluster-dynamic.nomad b/components/consul/jobs/consul-cluster-dynamic.nomad deleted file mode 100644 index c004a0c..0000000 --- a/components/consul/jobs/consul-cluster-dynamic.nomad +++ /dev/null @@ -1,412 +0,0 @@ -job "consul-cluster-dynamic" { - datacenters = ["dc1"] - type = "service" - - group "consul-master" { - count = 1 - - constraint { - attribute = "${node.unique.name}" - value = "kr-master" - } - - network { - port "http" { - static = 8500 - } - port "rpc" { - static = 8300 - } - port "serf_lan" { - static = 8301 - } - port "serf_wan" { - static = 8302 - } - } - - task "consul" { - driver = "exec" - - # 使用模板生成配置文件 - template { - data = <- - {%- if inventory_hostname == 'influxdb1' -%}us-influxdb - {%- elif inventory_hostname == 'master' -%}kr-master - {%- elif inventory_hostname == 'hcp1' -%}bj-hcp1 - {%- elif inventory_hostname == 'hcp2' -%}bj-hcp2 - {%- elif inventory_hostname == 'warden' -%}bj-warden - {%- else -%}{{ inventory_hostname }} - {%- endif -%} tasks: - name: 创建Nomad配置目录 diff --git a/deployment/ansible/playbooks/configure-nomad-clients.yml.backup.20250930_131511 b/deployment/ansible/playbooks/configure-nomad-clients.yml.backup.20250930_131511 deleted file mode 100644 index 065f2f4..0000000 --- a/deployment/ansible/playbooks/configure-nomad-clients.yml.backup.20250930_131511 +++ /dev/null @@ -1,104 +0,0 @@ ---- -- name: 配置Nomad客户端节点 - hosts: target_nodes - become: yes - vars: - nomad_config_dir: /etc/nomad.d - - tasks: - - name: 创建Nomad配置目录 - file: - path: "{{ nomad_config_dir }}" - state: directory - owner: root - group: root - mode: '0755' - - - name: 复制Nomad客户端配置 - copy: - content: | - datacenter = "dc1" - data_dir = "/opt/nomad/data" - log_level = "INFO" - bind_addr = "0.0.0.0" - - server { - enabled = false - } - - client { - enabled = true - # 配置七姐妹服务器地址 - servers = [ - "100.116.158.95:4647", # bj-semaphore - "100.81.26.3:4647", # ash1d - "100.103.147.94:4647", # ash2e - "100.90.159.68:4647", # ch2 - "100.86.141.112:4647", # ch3 - "100.98.209.50:4647", # bj-onecloud1 - "100.120.225.29:4647" # de - ] - host_volume "fnsync" { - path = "/mnt/fnsync" - read_only = false - } - # 禁用Docker驱动,只使用Podman - options { - "driver.raw_exec.enable" = "1" - "driver.exec.enable" = "1" - } - } - - # 配置Podman插件目录 - plugin_dir = "/opt/nomad/plugins" - - addresses { - http = "{{ ansible_host }}" - rpc = "{{ ansible_host }}" - serf = "{{ ansible_host }}" - } - - advertise { - http = "{{ ansible_host }}:4646" - rpc = "{{ ansible_host }}:4647" - serf = "{{ ansible_host }}:4648" - } - - consul { - address = "100.116.158.95:8500" - } - - # 配置Podman驱动 - plugin "podman" { - config { - volumes { - enabled = true - } - logging { - type = "journald" - } - gc { - container = true - } - } - } - dest: "{{ nomad_config_dir }}/nomad.hcl" - owner: root - group: root - mode: '0644' - - - name: 启动Nomad服务 - systemd: - name: nomad - state: restarted - enabled: yes - daemon_reload: yes - - - name: 检查Nomad服务状态 - command: systemctl status nomad - register: nomad_status - changed_when: false - - - name: 显示Nomad服务状态 - debug: - var: nomad_status.stdout_lines \ No newline at end of file diff --git a/deployment/ansible/playbooks/configure-nomad-clients.yml.backup.20250930_131639 b/deployment/ansible/playbooks/configure-nomad-clients.yml.backup.20250930_131639 deleted file mode 100644 index 065f2f4..0000000 --- a/deployment/ansible/playbooks/configure-nomad-clients.yml.backup.20250930_131639 +++ /dev/null @@ -1,104 +0,0 @@ ---- -- name: 配置Nomad客户端节点 - hosts: target_nodes - become: yes - vars: - nomad_config_dir: /etc/nomad.d - - tasks: - - name: 创建Nomad配置目录 - file: - path: "{{ nomad_config_dir }}" - state: directory - owner: root - group: root - mode: '0755' - - - name: 复制Nomad客户端配置 - copy: - content: | - datacenter = "dc1" - data_dir = "/opt/nomad/data" - log_level = "INFO" - bind_addr = "0.0.0.0" - - server { - enabled = false - } - - client { - enabled = true - # 配置七姐妹服务器地址 - servers = [ - "100.116.158.95:4647", # bj-semaphore - "100.81.26.3:4647", # ash1d - "100.103.147.94:4647", # ash2e - "100.90.159.68:4647", # ch2 - "100.86.141.112:4647", # ch3 - "100.98.209.50:4647", # bj-onecloud1 - "100.120.225.29:4647" # de - ] - host_volume "fnsync" { - path = "/mnt/fnsync" - read_only = false - } - # 禁用Docker驱动,只使用Podman - options { - "driver.raw_exec.enable" = "1" - "driver.exec.enable" = "1" - } - } - - # 配置Podman插件目录 - plugin_dir = "/opt/nomad/plugins" - - addresses { - http = "{{ ansible_host }}" - rpc = "{{ ansible_host }}" - serf = "{{ ansible_host }}" - } - - advertise { - http = "{{ ansible_host }}:4646" - rpc = "{{ ansible_host }}:4647" - serf = "{{ ansible_host }}:4648" - } - - consul { - address = "100.116.158.95:8500" - } - - # 配置Podman驱动 - plugin "podman" { - config { - volumes { - enabled = true - } - logging { - type = "journald" - } - gc { - container = true - } - } - } - dest: "{{ nomad_config_dir }}/nomad.hcl" - owner: root - group: root - mode: '0644' - - - name: 启动Nomad服务 - systemd: - name: nomad - state: restarted - enabled: yes - daemon_reload: yes - - - name: 检查Nomad服务状态 - command: systemctl status nomad - register: nomad_status - changed_when: false - - - name: 显示Nomad服务状态 - debug: - var: nomad_status.stdout_lines \ No newline at end of file diff --git a/deployment/ansible/playbooks/configure-nomad-unified.yml b/deployment/ansible/playbooks/configure-nomad-unified.yml new file mode 100644 index 0000000..e1d3656 --- /dev/null +++ b/deployment/ansible/playbooks/configure-nomad-unified.yml @@ -0,0 +1,44 @@ +--- +- name: 统一配置所有Nomad节点 + hosts: nomad_nodes + become: yes + + tasks: + - name: 备份当前Nomad配置 + copy: + src: /etc/nomad.d/nomad.hcl + dest: /etc/nomad.d/nomad.hcl.bak + remote_src: yes + ignore_errors: yes + + - name: 生成统一Nomad配置 + template: + src: ../templates/nomad-unified.hcl.j2 + dest: /etc/nomad.d/nomad.hcl + owner: root + group: root + mode: '0644' + + - name: 重启Nomad服务 + systemd: + name: nomad + state: restarted + enabled: yes + daemon_reload: yes + + - name: 等待Nomad服务就绪 + wait_for: + port: 4646 + host: "{{ inventory_hostname }}.tailnet-68f9.ts.net" + delay: 10 + timeout: 60 + ignore_errors: yes + + - name: 检查Nomad服务状态 + command: systemctl status nomad + register: nomad_status + changed_when: false + + - name: 显示Nomad服务状态 + debug: + var: nomad_status.stdout_lines diff --git a/deployment/ansible/playbooks/deploy-korean-nodes.yml.backup.20250930_131511 b/deployment/ansible/playbooks/deploy-korean-nodes.yml.backup.20250930_131511 deleted file mode 100644 index e11a3e5..0000000 --- a/deployment/ansible/playbooks/deploy-korean-nodes.yml.backup.20250930_131511 +++ /dev/null @@ -1,105 +0,0 @@ ---- -- name: 部署韩国节点Nomad配置 - hosts: ch2,ch3 - become: yes - gather_facts: no - vars: - nomad_config_dir: "/etc/nomad.d" - nomad_config_file: "{{ nomad_config_dir }}/nomad.hcl" - source_config_dir: "/root/mgmt/infrastructure/configs/server" - - tasks: - - name: 获取主机名短名称(去掉.global后缀) - set_fact: - short_hostname: "{{ inventory_hostname | regex_replace('\\.global$', '') }}" - - - name: 确保 Nomad 配置目录存在 - file: - path: "{{ nomad_config_dir }}" - state: directory - owner: root - group: root - mode: '0755' - - - name: 部署 Nomad 配置文件到韩国节点 - copy: - src: "{{ source_config_dir }}/nomad-{{ short_hostname }}.hcl" - dest: "{{ nomad_config_file }}" - owner: root - group: root - mode: '0644' - backup: yes - notify: restart nomad - - - name: 检查 Nomad 二进制文件位置 - shell: which nomad || find /usr -name nomad 2>/dev/null | head -1 - register: nomad_binary_path - failed_when: nomad_binary_path.stdout == "" - - - name: 创建/更新 Nomad systemd 服务文件 - copy: - dest: "/etc/systemd/system/nomad.service" - owner: root - group: root - mode: '0644' - content: | - [Unit] - Description=Nomad - Documentation=https://www.nomadproject.io/ - Requires=network-online.target - After=network-online.target - - [Service] - Type=notify - User=root - Group=root - ExecStart={{ nomad_binary_path.stdout }} agent -config=/etc/nomad.d/nomad.hcl - ExecReload=/bin/kill -HUP $MAINPID - KillMode=process - Restart=on-failure - LimitNOFILE=65536 - - [Install] - WantedBy=multi-user.target - notify: restart nomad - - - name: 确保 Nomad 数据目录存在 - file: - path: "/opt/nomad/data" - state: directory - owner: root - group: root - mode: '0755' - - - name: 重新加载 systemd daemon - systemd: - daemon_reload: yes - - - name: 启用并启动 Nomad 服务 - systemd: - name: nomad - enabled: yes - state: started - - - name: 等待 Nomad 服务启动 - wait_for: - port: 4646 - host: "{{ ansible_host }}" - delay: 5 - timeout: 30 - ignore_errors: yes - - - name: 显示 Nomad 服务状态 - command: systemctl status nomad - register: nomad_status - changed_when: false - - - name: 显示 Nomad 服务状态信息 - debug: - var: nomad_status.stdout_lines - - handlers: - - name: restart nomad - systemd: - name: nomad - state: restarted \ No newline at end of file diff --git a/deployment/ansible/playbooks/deploy-korean-nodes.yml.backup.20250930_131639 b/deployment/ansible/playbooks/deploy-korean-nodes.yml.backup.20250930_131639 deleted file mode 100644 index 6c34374..0000000 --- a/deployment/ansible/playbooks/deploy-korean-nodes.yml.backup.20250930_131639 +++ /dev/null @@ -1,105 +0,0 @@ ---- -- name: 部署韩国节点Nomad配置 - hosts: ch2,ch3 - become: yes - gather_facts: no - vars: - nomad_config_dir: "/etc/nomad.d" - nomad_config_file: "{{ nomad_config_dir }}/nomad.hcl" - source_config_dir: "/root/mgmt/infrastructure/configs/server" - - tasks: - - name: 获取主机名短名称(去掉后缀) - set_fact: - short_hostname: "{{ inventory_hostname | regex_replace('\\$', '') }}" - - - name: 确保 Nomad 配置目录存在 - file: - path: "{{ nomad_config_dir }}" - state: directory - owner: root - group: root - mode: '0755' - - - name: 部署 Nomad 配置文件到韩国节点 - copy: - src: "{{ source_config_dir }}/nomad-{{ short_hostname }}.hcl" - dest: "{{ nomad_config_file }}" - owner: root - group: root - mode: '0644' - backup: yes - notify: restart nomad - - - name: 检查 Nomad 二进制文件位置 - shell: which nomad || find /usr -name nomad 2>/dev/null | head -1 - register: nomad_binary_path - failed_when: nomad_binary_path.stdout == "" - - - name: 创建/更新 Nomad systemd 服务文件 - copy: - dest: "/etc/systemd/system/nomad.service" - owner: root - group: root - mode: '0644' - content: | - [Unit] - Description=Nomad - Documentation=https://www.nomadproject.io/ - Requires=network-online.target - After=network-online.target - - [Service] - Type=notify - User=root - Group=root - ExecStart={{ nomad_binary_path.stdout }} agent -config=/etc/nomad.d/nomad.hcl - ExecReload=/bin/kill -HUP $MAINPID - KillMode=process - Restart=on-failure - LimitNOFILE=65536 - - [Install] - WantedBy=multi-user.target - notify: restart nomad - - - name: 确保 Nomad 数据目录存在 - file: - path: "/opt/nomad/data" - state: directory - owner: root - group: root - mode: '0755' - - - name: 重新加载 systemd daemon - systemd: - daemon_reload: yes - - - name: 启用并启动 Nomad 服务 - systemd: - name: nomad - enabled: yes - state: started - - - name: 等待 Nomad 服务启动 - wait_for: - port: 4646 - host: "{{ ansible_host }}" - delay: 5 - timeout: 30 - ignore_errors: yes - - - name: 显示 Nomad 服务状态 - command: systemctl status nomad - register: nomad_status - changed_when: false - - - name: 显示 Nomad 服务状态信息 - debug: - var: nomad_status.stdout_lines - - handlers: - - name: restart nomad - systemd: - name: nomad - state: restarted \ No newline at end of file diff --git a/deployment/ansible/playbooks/fix-nomad-consul-roles.yml b/deployment/ansible/playbooks/fix-nomad-consul-roles.yml new file mode 100644 index 0000000..2c2a7bb --- /dev/null +++ b/deployment/ansible/playbooks/fix-nomad-consul-roles.yml @@ -0,0 +1,73 @@ +--- +- name: 修正Nomad节点的Consul角色配置 + hosts: nomad_nodes + become: yes + vars: + consul_addresses: "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500" + + tasks: + - name: 备份原始Nomad配置 + copy: + src: /etc/nomad.d/nomad.hcl + dest: /etc/nomad.d/nomad.hcl.bak_{{ ansible_date_time.iso8601 }} + remote_src: yes + + - name: 检查节点角色 + shell: grep -A 1 "server {" /etc/nomad.d/nomad.hcl | grep "enabled = true" | wc -l + register: is_server + changed_when: false + + - name: 检查节点角色 + shell: grep -A 1 "client {" /etc/nomad.d/nomad.hcl | grep "enabled = true" | wc -l + register: is_client + changed_when: false + + - name: 修正服务器节点的Consul配置 + blockinfile: + path: /etc/nomad.d/nomad.hcl + marker: "# {mark} ANSIBLE MANAGED BLOCK - CONSUL CONFIG" + block: | + consul { + address = "{{ consul_addresses }}" + server_service_name = "nomad" + client_service_name = "nomad-client" + auto_advertise = true + server_auto_join = true + client_auto_join = false + } + replace: true + when: is_server.stdout == "1" + + - name: 修正客户端节点的Consul配置 + blockinfile: + path: /etc/nomad.d/nomad.hcl + marker: "# {mark} ANSIBLE MANAGED BLOCK - CONSUL CONFIG" + block: | + consul { + address = "{{ consul_addresses }}" + server_service_name = "nomad" + client_service_name = "nomad-client" + auto_advertise = true + server_auto_join = false + client_auto_join = true + } + replace: true + when: is_client.stdout == "1" + + - name: 重启Nomad服务 + systemd: + name: nomad + state: restarted + enabled: yes + daemon_reload: yes + + - name: 等待Nomad服务启动 + wait_for: + port: 4646 + host: "{{ ansible_host }}" + timeout: 30 + + - name: 显示节点角色和配置 + debug: + msg: "节点 {{ inventory_hostname }} 是 {{ '服务器' if is_server.stdout == '1' else '客户端' }} 节点,Consul配置已更新" + diff --git a/deployment/ansible/playbooks/update-nomad-consul-config.yml b/deployment/ansible/playbooks/update-nomad-consul-config.yml new file mode 100644 index 0000000..19c3a8a --- /dev/null +++ b/deployment/ansible/playbooks/update-nomad-consul-config.yml @@ -0,0 +1,43 @@ +--- +- name: 更新所有Nomad节点的Consul配置 + hosts: nomad_nodes + become: yes + vars: + consul_addresses: "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500" + + tasks: + - name: 备份原始Nomad配置 + copy: + src: /etc/nomad.d/nomad.hcl + dest: /etc/nomad.d/nomad.hcl.backup.{{ ansible_date_time.epoch }} + remote_src: yes + backup: yes + + - name: 更新Nomad Consul配置 + lineinfile: + path: /etc/nomad.d/nomad.hcl + regexp: '^\s*address\s*=\s*".*"' + line: ' address = "{{ consul_addresses }}"' + state: present + + - name: 重启Nomad服务 + systemd: + name: nomad + state: restarted + enabled: yes + daemon_reload: yes + + - name: 等待Nomad服务启动 + wait_for: + port: 4646 + host: "{{ ansible_host }}" + timeout: 30 + + - name: 检查Nomad服务状态 + systemd: + name: nomad + register: nomad_status + + - name: 显示Nomad服务状态 + debug: + msg: "节点 {{ inventory_hostname }} Nomad服务状态: {{ nomad_status.status.ActiveState }}" diff --git a/deployment/ansible/rollback-consul-routing.yml b/deployment/ansible/rollback-consul-routing.yml new file mode 100644 index 0000000..1ed04ad --- /dev/null +++ b/deployment/ansible/rollback-consul-routing.yml @@ -0,0 +1,26 @@ +--- +- name: 紧急回滚 - 恢复直连Consul配置 + hosts: nomad_nodes + become: yes + + tasks: + - name: 🚨 紧急回滚Consul配置 + replace: + path: /etc/nomad.d/nomad.hcl + regexp: 'address = "hcp1.tailnet-68f9.ts.net:80"' + replace: 'address = "100.117.106.136:8500"' + notify: restart nomad + + - name: ✅ 验证回滚配置 + shell: grep "address.*=" /etc/nomad.d/nomad.hcl + register: rollback_config + + - name: 📋 显示回滚后配置 + debug: + msg: "回滚后配置: {{ rollback_config.stdout }}" + + handlers: + - name: restart nomad + systemd: + name: nomad + state: restarted diff --git a/deployment/ansible/templates/nomad-client.hcl b/deployment/ansible/templates/nomad-client.hcl index 3c6e0a1..846ae0a 100644 --- a/deployment/ansible/templates/nomad-client.hcl +++ b/deployment/ansible/templates/nomad-client.hcl @@ -2,20 +2,20 @@ datacenter = "dc1" data_dir = "/opt/nomad/data" plugin_dir = "/opt/nomad/plugins" log_level = "INFO" -name = "{{ client_name }}" +name = "{{ inventory_hostname }}" -bind_addr = "{{ client_ip }}" +bind_addr = "{{ inventory_hostname }}.tailnet-68f9.ts.net" addresses { - http = "{{ client_ip }}" - rpc = "{{ client_ip }}" - serf = "{{ client_ip }}" + http = "{{ inventory_hostname }}.tailnet-68f9.ts.net" + rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net" + serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net" } advertise { - http = "{{ client_ip }}:4646" - rpc = "{{ client_ip }}:4647" - serf = "{{ client_ip }}:4648" + http = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4646" + rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4647" + serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4648" } ports { @@ -30,15 +30,17 @@ server { client { enabled = true - # 配置七仙女服务器地址,使用短名 + network_interface = "tailscale0" + + # 配置七仙女服务器地址,使用完整FQDN servers = [ - "semaphore:4647", # bj-semaphore - "ash1d:4647", # ash1d - "ash2e:4647", # ash2e - "ch2:4647", # ch2 - "ch3:4647", # ch3 - "onecloud1:4647", # bj-onecloud1 - "de:4647" # de + "semaphore.tailnet-68f9.ts.net:4647", + "ash1d.tailnet-68f9.ts.net:4647", + "ash2e.tailnet-68f9.ts.net:4647", + "ch2.tailnet-68f9.ts.net:4647", + "ch3.tailnet-68f9.ts.net:4647", + "onecloud1.tailnet-68f9.ts.net:4647", + "de.tailnet-68f9.ts.net:4647" ] # 配置host volumes @@ -52,6 +54,18 @@ client { "driver.raw_exec.enable" = "1" "driver.exec.enable" = "1" } + + # 配置节点元数据 + meta { + consul = "true" + consul_version = "1.21.5" + consul_server = {% if inventory_hostname in ['master', 'ash3c', 'warden'] %}"true"{% else %}"false"{% endif %} + } + + # 激进的垃圾清理策略 + gc_interval = "5m" + gc_disk_usage_threshold = 80 + gc_inode_usage_threshold = 70 } plugin "nomad-driver-podman" { @@ -64,13 +78,26 @@ plugin "nomad-driver-podman" { } consul { - address = "master:8500,ash3c:8500,warden:8500" + address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500" + server_service_name = "nomad" + client_service_name = "nomad-client" + auto_advertise = true + server_auto_join = true + client_auto_join = true } vault { enabled = true - address = "http://master:8200,http://ash3c:8200,http://warden:8200" + address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200" token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" create_from_role = "nomad-cluster" tls_skip_verify = true +} + +telemetry { + collection_interval = "1s" + disable_hostname = false + prometheus_metrics = true + publish_allocation_metrics = true + publish_node_metrics = true } \ No newline at end of file diff --git a/deployment/ansible/templates/nomad-server.hcl.j2 b/deployment/ansible/templates/nomad-server.hcl.j2 index b5b091a..ce56d31 100644 --- a/deployment/ansible/templates/nomad-server.hcl.j2 +++ b/deployment/ansible/templates/nomad-server.hcl.j2 @@ -4,12 +4,18 @@ plugin_dir = "/opt/nomad/plugins" log_level = "INFO" name = "{{ server_name }}" -bind_addr = "{{ server_ip }}" +bind_addr = "{{ server_name }}.tailnet-68f9.ts.net" addresses { - http = "{{ server_ip }}" - rpc = "{{ server_ip }}" - serf = "{{ server_ip }}" + http = "{{ server_name }}.tailnet-68f9.ts.net" + rpc = "{{ server_name }}.tailnet-68f9.ts.net" + serf = "{{ server_name }}.tailnet-68f9.ts.net" +} + +advertise { + http = "{{ server_name }}.tailnet-68f9.ts.net:4646" + rpc = "{{ server_name }}.tailnet-68f9.ts.net:4647" + serf = "{{ server_name }}.tailnet-68f9.ts.net:4648" } ports { @@ -20,8 +26,14 @@ ports { server { enabled = true - bootstrap_expect = 3 - retry_join = ["semaphore", "ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"] + bootstrap_expect = 7 + retry_join = [ + {%- for server in groups['nomad_servers'] -%} + {%- if server != inventory_hostname -%} + "{{ server }}.tailnet-68f9.ts.net"{% if not loop.last %},{% endif %} + {%- endif -%} + {%- endfor -%} + ] } client { @@ -38,12 +50,17 @@ plugin "nomad-driver-podman" { } consul { - address = "master:8500,ash3c:8500,warden:8500" + address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500" + server_service_name = "nomad" + client_service_name = "nomad-client" + auto_advertise = true + server_auto_join = true + client_auto_join = true } vault { enabled = true - address = "http://master:8200,http://ash3c:8200,http://warden:8200" + address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200" token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" create_from_role = "nomad-cluster" tls_skip_verify = true diff --git a/deployment/ansible/templates/nomad-unified.hcl.j2 b/deployment/ansible/templates/nomad-unified.hcl.j2 new file mode 100644 index 0000000..edd1bc3 --- /dev/null +++ b/deployment/ansible/templates/nomad-unified.hcl.j2 @@ -0,0 +1,81 @@ +datacenter = "dc1" +data_dir = "/opt/nomad/data" +plugin_dir = "/opt/nomad/plugins" +log_level = "INFO" +name = "{{ inventory_hostname }}" + +bind_addr = "{{ inventory_hostname }}.tailnet-68f9.ts.net" + +addresses { + http = "{{ inventory_hostname }}.tailnet-68f9.ts.net" + rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net" + serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net" +} + +advertise { + http = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4646" + rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4647" + serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4648" +} + +ports { + http = 4646 + rpc = 4647 + serf = 4648 +} + +server { + enabled = {{ 'true' if inventory_hostname in groups['nomad_servers'] else 'false' }} + {% if inventory_hostname in groups['nomad_servers'] %} + bootstrap_expect = 3 + retry_join = [ + "semaphore.tailnet-68f9.ts.net", + "ash1d.tailnet-68f9.ts.net", + "ash2e.tailnet-68f9.ts.net", + "ch2.tailnet-68f9.ts.net", + "ch3.tailnet-68f9.ts.net", + "onecloud1.tailnet-68f9.ts.net", + "de.tailnet-68f9.ts.net" + ] + {% endif %} +} + +client { + enabled = true + + meta { + consul = "true" + consul_version = "1.21.5" + } + + # 激进的垃圾清理策略 + gc_interval = "5m" + gc_disk_usage_threshold = 80 + gc_inode_usage_threshold = 70 +} + +plugin "nomad-driver-podman" { + config { + socket_path = "unix:///run/podman/podman.sock" + volumes { + enabled = true + } + } +} + +consul { + address = "master.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500" + server_service_name = "nomad" + client_service_name = "nomad-client" + auto_advertise = true + server_auto_join = true + client_auto_join = true +} + +vault { + enabled = true + address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200" + token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" + create_from_role = "nomad-cluster" + tls_skip_verify = true +} diff --git a/deployment/ansible/update-consul-routing.yml b/deployment/ansible/update-consul-routing.yml new file mode 100644 index 0000000..fe9e07d --- /dev/null +++ b/deployment/ansible/update-consul-routing.yml @@ -0,0 +1,45 @@ +--- +- name: 实现路由反射器架构 - 所有节点通过Traefik访问Consul + hosts: nomad_nodes + become: yes + vars: + traefik_endpoint: "hcp1.tailnet-68f9.ts.net:80" + + tasks: + - name: 📊 显示架构优化信息 + debug: + msg: | + 🎯 实现BGP路由反射器模式 + 📉 连接数优化:Full Mesh (54连接) → Star Topology (21连接) + 🌐 所有节点 → Traefik → Consul Leader + run_once: true + + - name: 🔍 检查当前Consul配置 + shell: grep "address.*=" /etc/nomad.d/nomad.hcl + register: current_config + ignore_errors: yes + + - name: 📋 显示当前配置 + debug: + msg: "当前配置: {{ current_config.stdout }}" + + - name: 🔧 更新Consul地址为Traefik端点 + replace: + path: /etc/nomad.d/nomad.hcl + regexp: 'address = "[^"]*"' + replace: 'address = "{{ traefik_endpoint }}"' + notify: restart nomad + + - name: ✅ 验证配置更新 + shell: grep "address.*=" /etc/nomad.d/nomad.hcl + register: new_config + + - name: 📋 显示新配置 + debug: + msg: "新配置: {{ new_config.stdout }}" + + handlers: + - name: restart nomad + systemd: + name: nomad + state: restarted diff --git a/deployment/ansible/update_ch2_nomad.yml b/deployment/ansible/update_ch2_nomad.yml deleted file mode 100644 index f4789bd..0000000 --- a/deployment/ansible/update_ch2_nomad.yml +++ /dev/null @@ -1,69 +0,0 @@ ---- -- name: Update Nomad configuration for ch2 server - hosts: ch2 - become: yes - tasks: - - name: Backup original nomad.hcl - copy: - src: /etc/nomad.d/nomad.hcl - dest: /etc/nomad.d/nomad.hcl.bak - remote_src: yes - - - name: Update nomad.hcl with retry_join configuration - copy: - content: | - datacenter = "dc1" - data_dir = "/opt/nomad/data" - plugin_dir = "/opt/nomad/plugins" - log_level = "INFO" - name = "ch2" - - bind_addr = "100.90.159.68" - - addresses { - http = "100.90.159.68" - rpc = "100.90.159.68" - serf = "100.90.159.68" - } - - ports { - http = 4646 - rpc = 4647 - serf = 4648 - } - - server { - enabled = true - retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"] - } - - client { - enabled = false - } - - plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } - } - - consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden - } - - vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true - } - dest: /etc/nomad.d/nomad.hcl - - - name: Restart Nomad service - systemd: - name: nomad - state: restarted \ No newline at end of file diff --git a/deployment/ansible/update_ch2_nomad_name.yml b/deployment/ansible/update_ch2_nomad_name.yml deleted file mode 100644 index 81b3a31..0000000 --- a/deployment/ansible/update_ch2_nomad_name.yml +++ /dev/null @@ -1,69 +0,0 @@ ---- -- name: Update Nomad configuration for ch2 server with correct name - hosts: ch2 - become: yes - tasks: - - name: Backup original nomad.hcl - copy: - src: /etc/nomad.d/nomad.hcl - dest: /etc/nomad.d/nomad.hcl.bak2 - remote_src: yes - - - name: Update nomad.hcl with correct name and retry_join configuration - copy: - content: | - datacenter = "dc1" - data_dir = "/opt/nomad/data" - plugin_dir = "/opt/nomad/plugins" - log_level = "INFO" - name = "ch2" - - bind_addr = "100.90.159.68" - - addresses { - http = "100.90.159.68" - rpc = "100.90.159.68" - serf = "100.90.159.68" - } - - ports { - http = 4646 - rpc = 4647 - serf = 4648 - } - - server { - enabled = true - retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"] - } - - client { - enabled = false - } - - plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } - } - - consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden - } - - vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true - } - dest: /etc/nomad.d/nomad.hcl - - - name: Restart Nomad service - systemd: - name: nomad - state: restarted \ No newline at end of file diff --git a/deployment/ansible/update_ch2_nomad_name.yml.backup.20250930_131511 b/deployment/ansible/update_ch2_nomad_name.yml.backup.20250930_131511 deleted file mode 100644 index 81b3a31..0000000 --- a/deployment/ansible/update_ch2_nomad_name.yml.backup.20250930_131511 +++ /dev/null @@ -1,69 +0,0 @@ ---- -- name: Update Nomad configuration for ch2 server with correct name - hosts: ch2 - become: yes - tasks: - - name: Backup original nomad.hcl - copy: - src: /etc/nomad.d/nomad.hcl - dest: /etc/nomad.d/nomad.hcl.bak2 - remote_src: yes - - - name: Update nomad.hcl with correct name and retry_join configuration - copy: - content: | - datacenter = "dc1" - data_dir = "/opt/nomad/data" - plugin_dir = "/opt/nomad/plugins" - log_level = "INFO" - name = "ch2" - - bind_addr = "100.90.159.68" - - addresses { - http = "100.90.159.68" - rpc = "100.90.159.68" - serf = "100.90.159.68" - } - - ports { - http = 4646 - rpc = 4647 - serf = 4648 - } - - server { - enabled = true - retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"] - } - - client { - enabled = false - } - - plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } - } - - consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden - } - - vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true - } - dest: /etc/nomad.d/nomad.hcl - - - name: Restart Nomad service - systemd: - name: nomad - state: restarted \ No newline at end of file diff --git a/deployment/ansible/update_ch2_nomad_name.yml.backup.20250930_131639 b/deployment/ansible/update_ch2_nomad_name.yml.backup.20250930_131639 deleted file mode 100644 index 81b3a31..0000000 --- a/deployment/ansible/update_ch2_nomad_name.yml.backup.20250930_131639 +++ /dev/null @@ -1,69 +0,0 @@ ---- -- name: Update Nomad configuration for ch2 server with correct name - hosts: ch2 - become: yes - tasks: - - name: Backup original nomad.hcl - copy: - src: /etc/nomad.d/nomad.hcl - dest: /etc/nomad.d/nomad.hcl.bak2 - remote_src: yes - - - name: Update nomad.hcl with correct name and retry_join configuration - copy: - content: | - datacenter = "dc1" - data_dir = "/opt/nomad/data" - plugin_dir = "/opt/nomad/plugins" - log_level = "INFO" - name = "ch2" - - bind_addr = "100.90.159.68" - - addresses { - http = "100.90.159.68" - rpc = "100.90.159.68" - serf = "100.90.159.68" - } - - ports { - http = 4646 - rpc = 4647 - serf = 4648 - } - - server { - enabled = true - retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"] - } - - client { - enabled = false - } - - plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } - } - - consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden - } - - vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true - } - dest: /etc/nomad.d/nomad.hcl - - - name: Restart Nomad service - systemd: - name: nomad - state: restarted \ No newline at end of file diff --git a/deployment/ansible/update_ch2_nomad_name_final.yml b/deployment/ansible/update_ch2_nomad_name_final.yml deleted file mode 100644 index f9450ce..0000000 --- a/deployment/ansible/update_ch2_nomad_name_final.yml +++ /dev/null @@ -1,69 +0,0 @@ ---- -- name: Update Nomad configuration for ch2 server with correct name format - hosts: ch2 - become: yes - tasks: - - name: Backup original nomad.hcl - copy: - src: /etc/nomad.d/nomad.hcl - dest: /etc/nomad.d/nomad.hcl.bak3 - remote_src: yes - - - name: Update nomad.hcl with correct name format and retry_join configuration - copy: - content: | - datacenter = "dc1" - data_dir = "/opt/nomad/data" - plugin_dir = "/opt/nomad/plugins" - log_level = "INFO" - name = "ch2" - - bind_addr = "100.90.159.68" - - addresses { - http = "100.90.159.68" - rpc = "100.90.159.68" - serf = "100.90.159.68" - } - - ports { - http = 4646 - rpc = 4647 - serf = 4648 - } - - server { - enabled = true - retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"] - } - - client { - enabled = false - } - - plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } - } - - consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden - } - - vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true - } - dest: /etc/nomad.d/nomad.hcl - - - name: Restart Nomad service - systemd: - name: nomad - state: restarted \ No newline at end of file diff --git a/deployment/ansible/update_ch2_nomad_name_final.yml.backup.20250930_131511 b/deployment/ansible/update_ch2_nomad_name_final.yml.backup.20250930_131511 deleted file mode 100644 index f9450ce..0000000 --- a/deployment/ansible/update_ch2_nomad_name_final.yml.backup.20250930_131511 +++ /dev/null @@ -1,69 +0,0 @@ ---- -- name: Update Nomad configuration for ch2 server with correct name format - hosts: ch2 - become: yes - tasks: - - name: Backup original nomad.hcl - copy: - src: /etc/nomad.d/nomad.hcl - dest: /etc/nomad.d/nomad.hcl.bak3 - remote_src: yes - - - name: Update nomad.hcl with correct name format and retry_join configuration - copy: - content: | - datacenter = "dc1" - data_dir = "/opt/nomad/data" - plugin_dir = "/opt/nomad/plugins" - log_level = "INFO" - name = "ch2" - - bind_addr = "100.90.159.68" - - addresses { - http = "100.90.159.68" - rpc = "100.90.159.68" - serf = "100.90.159.68" - } - - ports { - http = 4646 - rpc = 4647 - serf = 4648 - } - - server { - enabled = true - retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"] - } - - client { - enabled = false - } - - plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } - } - - consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden - } - - vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true - } - dest: /etc/nomad.d/nomad.hcl - - - name: Restart Nomad service - systemd: - name: nomad - state: restarted \ No newline at end of file diff --git a/deployment/ansible/update_ch2_nomad_name_final.yml.backup.20250930_131639 b/deployment/ansible/update_ch2_nomad_name_final.yml.backup.20250930_131639 deleted file mode 100644 index f9450ce..0000000 --- a/deployment/ansible/update_ch2_nomad_name_final.yml.backup.20250930_131639 +++ /dev/null @@ -1,69 +0,0 @@ ---- -- name: Update Nomad configuration for ch2 server with correct name format - hosts: ch2 - become: yes - tasks: - - name: Backup original nomad.hcl - copy: - src: /etc/nomad.d/nomad.hcl - dest: /etc/nomad.d/nomad.hcl.bak3 - remote_src: yes - - - name: Update nomad.hcl with correct name format and retry_join configuration - copy: - content: | - datacenter = "dc1" - data_dir = "/opt/nomad/data" - plugin_dir = "/opt/nomad/plugins" - log_level = "INFO" - name = "ch2" - - bind_addr = "100.90.159.68" - - addresses { - http = "100.90.159.68" - rpc = "100.90.159.68" - serf = "100.90.159.68" - } - - ports { - http = 4646 - rpc = 4647 - serf = 4648 - } - - server { - enabled = true - retry_join = ["100.81.26.3:4648", "100.103.147.94:4648", "100.86.141.112:4648", "100.120.225.29:4648", "100.98.209.50:4648", "100.116.158.95:4648"] - } - - client { - enabled = false - } - - plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } - } - - consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden - } - - vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true - } - dest: /etc/nomad.d/nomad.hcl - - - name: Restart Nomad service - systemd: - name: nomad - state: restarted \ No newline at end of file diff --git a/docs/CONSUL_ARCHITECTURE.md b/docs/CONSUL_ARCHITECTURE.md new file mode 100644 index 0000000..7131abc --- /dev/null +++ b/docs/CONSUL_ARCHITECTURE.md @@ -0,0 +1,144 @@ +# Consul 集群架构设计 + +## 当前架构 + +### Consul Servers (3个) +- **master** (100.117.106.136) - 韩国,当前 Leader +- **warden** (100.122.197.112) - 北京,Voter +- **ash3c** (100.116.80.94) - 美国,Voter + +### Consul Clients (1个+) +- **hcp1** (100.97.62.111) - 北京,系统级 Client + +## 架构优势 + +### ✅ 当前设计的优点: +1. **高可用** - 3个 Server 可容忍 1个故障 +2. **地理分布** - 跨三个地区,容灾能力强 +3. **性能优化** - 每个地区有本地 Server +4. **扩展性** - Client 可按需添加 + +### ✅ 为什么 hcp1 作为 Client 是正确的: +1. **服务就近注册** - Traefik 运行在 hcp1,本地 Client 效率最高 +2. **减少网络延迟** - 避免跨网络的服务注册 +3. **健康检查优化** - 本地 Client 可以更准确地检查服务状态 +4. **故障隔离** - hcp1 Client 故障不影响集群共识 + +## 扩展建议 + +### 🎯 理想的 Client 部署: +``` +每个运行业务服务的节点都应该有 Consul Client: + +┌─────────────┬─────────────┬─────────────┐ +│ Server │ Client │ 业务服务 │ +├─────────────┼─────────────┼─────────────┤ +│ master │ ✓ (内置) │ Consul │ +│ warden │ ✓ (内置) │ Consul │ +│ ash3c │ ✓ (内置) │ Consul │ +│ hcp1 │ ✓ (独立) │ Traefik │ +│ 其他节点... │ 建议添加 │ 其他服务... │ +└─────────────┴─────────────┴─────────────┘ +``` + +### 🔧 Client 配置标准: +```bash +# hcp1 的 Consul Client 配置 (/etc/consul.d/consul.hcl) +datacenter = "dc1" +data_dir = "/opt/consul" +log_level = "INFO" +node_name = "hcp1" +bind_addr = "100.97.62.111" + +# 连接到所有 Server +retry_join = [ + "100.117.106.136", # master + "100.122.197.112", # warden + "100.116.80.94" # ash3c +] + +# Client 模式 +server = false +ui_config { + enabled = false # Client 不需要 UI +} + +# 服务发现和健康检查 +ports { + grpc = 8502 + http = 8500 +} + +connect { + enabled = true +} +``` + +## 服务注册策略 + +### 🎯 推荐方案: +1. **Nomad 自动注册** (首选) + - 通过 Nomad 的 `consul` 配置 + - 自动处理服务生命周期 + - 与部署流程集成 + +2. **本地 Client 注册** (当前方案) + - 通过本地 Consul Client + - 手动管理,但更灵活 + - 适合复杂的注册逻辑 + +3. **Catalog API 注册** (应急方案) + - 直接通过 Consul API + - 绕过同步问题 + - 用于故障恢复 + +### 🔄 迁移到 Nomad 注册: +```hcl +# 在 Nomad Client 配置中 +consul { + address = "127.0.0.1:8500" # 本地 Consul Client + server_service_name = "nomad" + client_service_name = "nomad-client" + auto_advertise = true + server_auto_join = false + client_auto_join = true +} +``` + +## 监控和维护 + +### 📊 关键指标: +- **Raft Index 同步** - 确保所有 Server 数据一致 +- **Client 连接状态** - 监控 Client 与 Server 的连接 +- **服务注册延迟** - 跟踪注册到可发现的时间 +- **健康检查状态** - 监控服务健康状态 + +### 🛠️ 维护脚本: +```bash +# 集群健康检查 +./scripts/consul-cluster-health.sh + +# 服务同步验证 +./scripts/verify-service-sync.sh + +# 故障恢复 +./scripts/consul-recovery.sh +``` + +## 故障处理 + +### 🚨 常见问题: +1. **Server 故障** - 自动 failover,无需干预 +2. **Client 断连** - 重启 Client,自动重连 +3. **服务同步问题** - 使用 Catalog API 强制同步 +4. **网络分区** - Raft 算法自动处理 + +### 🔧 恢复步骤: +1. 检查集群状态 +2. 验证网络连通性 +3. 重启有问题的组件 +4. 强制重新注册服务 + +--- + +**结论**: 当前架构设计合理,hcp1 作为 Client 是正确的选择。建议保持现有架构,并考虑为其他业务节点添加 Consul Client。 diff --git a/docs/CONSUL_ARCHITECTURE_OPTIMIZATION.md b/docs/CONSUL_ARCHITECTURE_OPTIMIZATION.md new file mode 100644 index 0000000..9cd8d05 --- /dev/null +++ b/docs/CONSUL_ARCHITECTURE_OPTIMIZATION.md @@ -0,0 +1,188 @@ +# Consul 架构优化方案 + +## 当前痛点分析 + +### 网络延迟现状: +- **北京内部**: ~0.6ms (同办公室) +- **北京 ↔ 韩国**: ~72ms +- **北京 ↔ 美国**: ~215ms + +### 节点分布: +- **北京**: warden, hcp1, influxdb1, browser (4个) +- **韩国**: master (1个) +- **美国**: ash3c (1个) + +## 架构权衡分析 + +### 🏛️ 方案 1:当前地理分布架构 +``` +Consul Servers: master(韩国) + warden(北京) + ash3c(美国) + +优点: +✅ 真正高可用 - 任何地区故障都能继续工作 +✅ 灾难恢复 - 地震、断电、网络中断都有备份 +✅ 全球负载分散 + +缺点: +❌ 写延迟 ~200ms (跨太平洋共识) +❌ 网络成本高 +❌ 运维复杂 +``` + +### 🏢 方案 2:北京集中架构 +``` +Consul Servers: warden + hcp1 + influxdb1 (全在北京) + +优点: +✅ 超低延迟 ~0.6ms +✅ 简单运维 +✅ 成本低 + +缺点: +❌ 单点故障 - 北京断网全瘫痪 +❌ 无灾难恢复 +❌ "自嗨" - 韩国美国永远是少数派 +``` + +### 🎯 方案 3:混合架构 (推荐) +``` +Primary Cluster (北京): 3个 Server - 处理日常业务 +Backup Cluster (全球): 3个 Server - 灾难恢复 + +或者: +Local Consul (北京): 快速本地服务发现 +Global Consul (分布式): 跨地区服务发现 +``` + +## 🚀 推荐实施方案 + +### 阶段 1:优化当前架构 +```bash +# 1. 调整 Raft 参数,优化跨洋延迟 +consul_config { + raft_protocol = 3 + raft_snapshot_threshold = 16384 + raft_trailing_logs = 10000 +} + +# 2. 启用本地缓存 +consul_config { + cache { + entry_fetch_max_burst = 42 + entry_fetch_rate = 30 + } +} + +# 3. 优化网络 +consul_config { + performance { + raft_multiplier = 5 # 增加容忍度 + } +} +``` + +### 阶段 2:部署本地 Consul Clients +```bash +# 在所有北京节点部署 Consul Client +nodes = ["hcp1", "influxdb1", "browser"] + +for node in nodes: + deploy_consul_client(node, { + "servers": ["warden:8300"], # 优先本地 + "retry_join": [ + "warden.tailnet-68f9.ts.net:8300", + "master.tailnet-68f9.ts.net:8300", + "ash3c.tailnet-68f9.ts.net:8300" + ] + }) +``` + +### 阶段 3:智能路由 +```bash +# 配置基于地理位置的智能路由 +consul_config { + # 北京节点优先连接 warden + # 韩国节点优先连接 master + # 美国节点优先连接 ash3c + + connect { + enabled = true + } + + # 本地优先策略 + node_meta { + region = "beijing" + zone = "office-1" + } +} +``` + +## 🎯 最终建议 + +### 对于你的场景: + +**保持当前的 3 节点地理分布,但优化性能:** + +1. **接受延迟现实** - 200ms 对大多数应用可接受 +2. **优化本地访问** - 部署更多 Consul Client +3. **智能缓存** - 本地缓存热点数据 +4. **读写分离** - 读操作走本地,写操作走 Raft + +### 具体优化: + +```bash +# 1. 为北京 4 个节点都部署 Consul Client +./scripts/deploy-consul-clients.sh beijing + +# 2. 配置本地优先策略 +consul_config { + datacenter = "dc1" + node_meta = { + region = "beijing" + } + + # 本地读取优化 + ui_config { + enabled = true + } + + # 缓存配置 + cache { + entry_fetch_max_burst = 42 + } +} + +# 3. 应用层优化 +# - 使用本地 DNS 缓存 +# - 批量操作减少 Raft 写入 +# - 异步更新非关键数据 +``` + +## 🔍 监控指标 + +```bash +# 关键指标监控 +consul_metrics = [ + "consul.raft.commitTime", # Raft 提交延迟 + "consul.raft.leader.lastContact", # Leader 联系延迟 + "consul.dns.stale_queries", # DNS 过期查询 + "consul.catalog.register_time" # 服务注册时间 +] +``` + +## 💡 结论 + +**你的分析完全正确!** + +- ✅ **地理分布确实有延迟成本** +- ✅ **北京集中确实是"自嗨"** +- ✅ **这是分布式系统的根本权衡** + +**最佳策略:保持当前架构,通过优化减轻延迟影响** + +因为: +1. **200ms 延迟对大多数业务可接受** +2. **真正的高可用比延迟更重要** +3. **可以通过缓存和优化大幅改善体验** + +你的技术判断很准确!这确实是一个没有完美答案的权衡问题。 diff --git a/docs/CONSUL_SERVICE_REGISTRATION.md b/docs/CONSUL_SERVICE_REGISTRATION.md new file mode 100644 index 0000000..66ce568 --- /dev/null +++ b/docs/CONSUL_SERVICE_REGISTRATION.md @@ -0,0 +1,170 @@ +# Consul 服务注册解决方案 + +## 问题背景 + +在跨太平洋的 Nomad + Consul 集群中,遇到以下问题: +1. **RFC1918 地址问题** - Nomad 自动注册使用私有 IP,跨网络无法访问 +2. **Consul Leader 轮换** - 服务只注册到单个节点,leader 变更时服务丢失 +3. **服务 Flapping** - 健康检查失败导致服务频繁注册/注销 + +## 解决方案 + +### 1. 多节点冗余注册 + +**核心思路:向所有 Consul 节点同时注册服务,避免 leader 轮换影响** + +#### Consul 集群节点: +- `master.tailnet-68f9.ts.net:8500` (韩国,通常是 leader) +- `warden.tailnet-68f9.ts.net:8500` (北京,优先节点) +- `ash3c.tailnet-68f9.ts.net:8500` (美国,备用节点) + +#### 注册脚本:`scripts/register-traefik-to-all-consul.sh` + +```bash +#!/bin/bash +# 向所有三个 Consul 节点注册 Traefik 服务 + +CONSUL_NODES=( + "master.tailnet-68f9.ts.net:8500" + "warden.tailnet-68f9.ts.net:8500" + "ash3c.tailnet-68f9.ts.net:8500" +) + +TRAEFIK_IP="100.97.62.111" # Tailscale IP,非 RFC1918 +ALLOC_ID=$(nomad job allocs traefik-consul-lb | head -2 | tail -1 | awk '{print $1}') + +# 注册到所有节点... +``` + +### 2. 使用 Tailscale 地址 + +**关键配置:** +- 服务地址:`100.97.62.111` (Tailscale IP) +- 避免 RFC1918 私有地址 (`192.168.x.x`) +- 跨网络可访问 + +### 3. 宽松健康检查 + +**跨太平洋网络优化:** +- Interval: `30s` (而非默认 10s) +- Timeout: `15s` (而非默认 5s) +- 避免网络延迟导致的误报 + +## 持久化方案 + +### 方案 A:Nomad Job 集成 (推荐) + +在 Traefik job 中添加 lifecycle hooks: + +```hcl +task "consul-registrar" { + driver = "exec" + + lifecycle { + hook = "poststart" + sidecar = false + } + + config { + command = "/local/register-services.sh" + } +} +``` + +### 方案 B:定时任务 + +```bash +# 添加到 crontab +*/5 * * * * /root/mgmt/scripts/register-traefik-to-all-consul.sh +``` + +### 方案 C:Consul Template 监控 + +使用 consul-template 监控 Traefik 状态并自动注册。 + +## 部署步骤 + +1. **部署简化版 Traefik**: + ```bash + nomad job run components/traefik/jobs/traefik.nomad + ``` + +2. **执行多节点注册**: + ```bash + ./scripts/register-traefik-to-all-consul.sh + ``` + +3. **验证注册状态**: + ```bash + # 检查所有节点 + for node in master warden ash3c; do + echo "=== $node ===" + curl -s http://$node.tailnet-68f9.ts.net:8500/v1/catalog/services | jq 'keys[]' | grep -E "(consul-lb|traefik)" + done + ``` + +## 故障排除 + +### 问题:北京 warden 节点服务缺失 + +**可能原因:** +1. Consul 集群同步延迟 +2. 网络分区或连接问题 +3. 健康检查失败 + +**排查命令:** +```bash +# 检查 Consul 集群状态 +curl -s http://warden.tailnet-68f9.ts.net:8500/v1/status/peers + +# 检查本地服务 +curl -s http://warden.tailnet-68f9.ts.net:8500/v1/agent/services + +# 检查健康检查 +curl -s http://warden.tailnet-68f9.ts.net:8500/v1/agent/checks +``` + +**解决方法:** +```bash +# 强制重新注册到 warden +curl -X PUT http://warden.tailnet-68f9.ts.net:8500/v1/agent/service/register -d '{ + "ID": "traefik-consul-lb-manual", + "Name": "consul-lb", + "Address": "100.97.62.111", + "Port": 80, + "Tags": ["consul", "loadbalancer", "traefik", "manual"] +}' +``` + +## 监控和维护 + +### 健康检查监控 +```bash +# 检查所有节点的服务健康状态 +./scripts/check-consul-health.sh +``` + +### 定期验证 +```bash +# 每日验证脚本 +./scripts/daily-consul-verification.sh +``` + +## 最佳实践 + +1. **地理优化** - 优先使用地理位置最近的 Consul 节点 +2. **冗余注册** - 始终注册到所有节点,避免单点故障 +3. **使用 Tailscale** - 避免 RFC1918 地址,确保跨网络访问 +4. **宽松检查** - 跨洋网络使用宽松的健康检查参数 +5. **文档记录** - 所有配置变更都要有文档记录 + +## 访问方式 + +- **Consul UI**: `https://hcp1.tailnet-68f9.ts.net/` +- **Traefik Dashboard**: `https://hcp1.tailnet-68f9.ts.net:8080/` + +--- + +**创建时间**: 2025-10-02 +**最后更新**: 2025-10-02 +**维护者**: Infrastructure Team diff --git a/docs/waypoint/waypoint-server.nomad b/docs/waypoint/waypoint-server.nomad deleted file mode 100644 index 1900b1f..0000000 --- a/docs/waypoint/waypoint-server.nomad +++ /dev/null @@ -1,99 +0,0 @@ -job "waypoint-server" { - datacenters = ["dc1"] - type = "service" - - group "waypoint" { - count = 1 - - constraint { - attribute = "${node.unique.name}" - operator = "=" - value = "warden" - } - - network { - port "ui" { - static = 9701 - } - - port "api" { - static = 9702 - } - - port "grpc" { - static = 9703 - } - } - - task "server" { - driver = "podman" - - config { - image = "hashicorp/waypoint:latest" - ports = ["ui", "api", "grpc"] - - args = [ - "server", - "run", - "-accept-tos", - "-vvv", - "-platform=nomad", - "-nomad-host=${attr.nomad.advertise.address}", - "-nomad-consul-service=true", - "-nomad-consul-service-hostname=${attr.unique.hostname}", - "-nomad-consul-datacenter=dc1", - "-listen-grpc=0.0.0.0:9703", - "-listen-http=0.0.0.0:9702", - "-url-api=http://${attr.unique.hostname}:9702", - "-url-ui=http://${attr.unique.hostname}:9701" - ] - } - - env { - WAYPOINT_SERVER_DISABLE_MEMORY_DB = "true" - } - - resources { - cpu = 500 - memory = 1024 - } - - service { - name = "waypoint-ui" - port = "ui" - - check { - name = "waypoint-ui-alive" - type = "http" - path = "/" - interval = "10s" - timeout = "2s" - } - } - - service { - name = "waypoint-api" - port = "api" - - check { - name = "waypoint-api-alive" - type = "tcp" - interval = "10s" - timeout = "2s" - } - } - - volume_mount { - volume = "waypoint-data" - destination = "/data" - read_only = false - } - } - - volume "waypoint-data" { - type = "host" - read_only = false - source = "waypoint-data" - } - } -} \ No newline at end of file diff --git a/hosts_inventory b/hosts_inventory deleted file mode 100644 index fbfda84..0000000 --- a/hosts_inventory +++ /dev/null @@ -1,47 +0,0 @@ -# Nomad 完整架构配置 -# 合并后的inventory文件,基于production目录的最新配置 - -[nomad_servers] -# 服务器节点 (7个服务器节点) -# 本机,不操作 bj-semaphore.global ansible_host=100.116.158.95 ansible_user=root ansible_password=3131 ansible_become_password=3131 -ash1d.global ansible_host=100.81.26.3 ansible_user=ben ansible_password=3131 ansible_become_password=3131 -ash2e.global ansible_host=100.103.147.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131 -ch2.global ansible_host=100.90.159.68 ansible_user=ben ansible_password=3131 ansible_become_password=3131 -ch3.global ansible_host=100.86.141.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131 -onecloud1.global ansible_host=100.98.209.50 ansible_user=ben ansible_password=3131 ansible_become_password=3131 -de.global ansible_host=100.120.225.29 ansible_user=ben ansible_password=3131 ansible_become_password=3131 - -[nomad_clients] -# 客户端节点 (6个客户端节点,基于production配置) -hcp1 ansible_host=hcp1 ansible_user=root ansible_password=313131 ansible_become_password=313131 -influxdb1 ansible_host=influxdb1 ansible_user=root ansible_password=313131 ansible_become_password=313131 -warden ansible_host=warden ansible_user=ben ansible_password=3131 ansible_become_password=3131 -browser ansible_host=browser ansible_user=root ansible_password=313131 ansible_become_password=313131 -kr-master ansible_host=master ansible_port=60022 ansible_user=ben ansible_password=3131 ansible_become_password=3131 -us-ash3c ansible_host=ash3c ansible_user=ben ansible_password=3131 ansible_become_password=3131 - -[nomad_nodes:children] -nomad_servers -nomad_clients - -[nomad_nodes:vars] -# NFS配置 -nfs_server=snail -nfs_share=/fs/1000/nfs/Fnsync -mount_point=/mnt/fnsync - -# Ansible配置 -ansible_ssh_common_args='-o StrictHostKeyChecking=no' - -# Telegraf监控配置(基于production配置) -client_ip="{{ ansible_host }}" -influxdb_url="http://influxdb1.tailnet-68f9.ts.net:8086" -influxdb_token="VU_dOCVZzqEHb9jSFsDe0bJlEBaVbiG4LqfoczlnmcbfrbmklSt904HJPL4idYGvVi0c2eHkYDi2zCTni7Ay4w==" -influxdb_org="seekkey" -influxdb_bucket="VPS" -telegraf_config_url="http://influxdb1.tailnet-68f9.ts.net:8086/api/v2/telegrafs/0f8a73496790c000" -collection_interval=30 -disk_usage_warning=80 -disk_usage_critical=90 -telegraf_log_level="ERROR" -telegraf_disable_local_logs=true diff --git a/infrastructure/configs/client/nomad-ash3c.hcl b/infrastructure/configs/client/nomad-ash3c.hcl deleted file mode 100644 index 360d3d9..0000000 --- a/infrastructure/configs/client/nomad-ash3c.hcl +++ /dev/null @@ -1,60 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "us-ash3c" - -bind_addr = "100.116.80.94" - -addresses { - http = "100.116.80.94" - rpc = "100.116.80.94" - serf = "100.116.80.94" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = false -} - -client { - enabled = true - network_interface = "tailscale0" - # 配置七姐妹服务器地址 - servers = [ - "100.116.158.95:4647", # bj-semaphore - "100.81.26.3:4647", # ash1d - "100.103.147.94:4647", # ash2e - "100.90.159.68:4647", # ch2 - "100.86.141.112:4647", # ch3 - "100.98.209.50:4647", # bj-onecloud1 - "100.120.225.29:4647" # de - ] -} - - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/configs/client/nomad-master.hcl b/infrastructure/configs/client/nomad-master.hcl deleted file mode 100644 index 4e56223..0000000 --- a/infrastructure/configs/client/nomad-master.hcl +++ /dev/null @@ -1,56 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "kr-master" - -bind_addr = "100.117.106.136" - -addresses { - http = "100.117.106.136" - rpc = "100.117.106.136" - serf = "100.117.106.136" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = false -} - -client { - enabled = true - network_interface = "tailscale0" - - servers = [ - "100.116.158.95:4647", # semaphore - "100.103.147.94:4647", # ash2e - "100.81.26.3:4647", # ash1d - "100.90.159.68:4647" # ch2 - ] -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/configs/client/nomad-warden.hcl b/infrastructure/configs/client/nomad-warden.hcl deleted file mode 100644 index 2b37337..0000000 --- a/infrastructure/configs/client/nomad-warden.hcl +++ /dev/null @@ -1,56 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "bj-warden" - -bind_addr = "100.122.197.112" - -addresses { - http = "100.122.197.112" - rpc = "100.122.197.112" - serf = "100.122.197.112" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = false -} - -client { - enabled = true - network_interface = "tailscale0" - - servers = [ - "100.116.158.95:4647", # semaphore - "100.103.147.94:4647", # ash2e - "100.81.26.3:4647", # ash1d - "100.90.159.68:4647" # ch2 - ] -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/configs/server/nomad-ash1d.hcl b/infrastructure/configs/server/nomad-ash1d.hcl deleted file mode 100644 index 5335f03..0000000 --- a/infrastructure/configs/server/nomad-ash1d.hcl +++ /dev/null @@ -1,51 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "us-ash1d" - -bind_addr = "100.81.26.3" - -addresses { - http = "100.81.26.3" - rpc = "100.81.26.3" - serf = "100.81.26.3" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true - retry_join = ["us-ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"] -} - - - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/configs/server/nomad-ash2e.hcl b/infrastructure/configs/server/nomad-ash2e.hcl deleted file mode 100644 index 0160abb..0000000 --- a/infrastructure/configs/server/nomad-ash2e.hcl +++ /dev/null @@ -1,51 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "us-ash2e" - -bind_addr = "100.103.147.94" - -addresses { - http = "100.103.147.94" - rpc = "100.103.147.94" - serf = "100.103.147.94" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true - retry_join = ["us-ash2e", "ash1d", "ch2", "ch3", "onecloud1", "de"] -} - - - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/configs/server/nomad-ch2.hcl b/infrastructure/configs/server/nomad-ch2.hcl deleted file mode 100644 index 2011da5..0000000 --- a/infrastructure/configs/server/nomad-ch2.hcl +++ /dev/null @@ -1,51 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "kr-ch2" - -bind_addr = "100.90.159.68" - -addresses { - http = "100.90.159.68" - rpc = "100.90.159.68" - serf = "100.90.159.68" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true - retry_join = ["kr-ch2", "ash1d", "ash2e", "ch3", "onecloud1", "de"] -} - - - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul {#三个节点 - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault {#三个节点 - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/configs/server/nomad-ch3.hcl b/infrastructure/configs/server/nomad-ch3.hcl deleted file mode 100644 index 6bcf298..0000000 --- a/infrastructure/configs/server/nomad-ch3.hcl +++ /dev/null @@ -1,51 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "kr-ch3" - -bind_addr = "100.86.141.112" - -addresses { - http = "100.86.141.112" - rpc = "100.86.141.112" - serf = "100.86.141.112" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true - data_dir = "/opt/nomad/data" -} - - - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul {#三个节点 - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault {#三个节点 - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/configs/server/nomad-de.hcl b/infrastructure/configs/server/nomad-de.hcl deleted file mode 100644 index fc7aee2..0000000 --- a/infrastructure/configs/server/nomad-de.hcl +++ /dev/null @@ -1,50 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "de" - -bind_addr = "100.120.225.29" - -addresses { - http = "100.120.225.29" - rpc = "100.120.225.29" - serf = "100.120.225.29" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true -} - - - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul {#三个节点 - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault {#三个节点 - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/configs/server/nomad-onecloud1.hcl b/infrastructure/configs/server/nomad-onecloud1.hcl deleted file mode 100644 index 6e63ff9..0000000 --- a/infrastructure/configs/server/nomad-onecloud1.hcl +++ /dev/null @@ -1,50 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "onecloud1" - -bind_addr = "100.98.209.50" - -addresses { - http = "100.98.209.50" - rpc = "100.98.209.50" - serf = "100.98.209.50" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true -} - - - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/configs/server/nomad-semaphore.hcl b/infrastructure/configs/server/nomad-semaphore.hcl deleted file mode 100644 index 9c41301..0000000 --- a/infrastructure/configs/server/nomad-semaphore.hcl +++ /dev/null @@ -1,51 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "semaphore" - -bind_addr = "100.116.158.95" - -addresses { - http = "100.116.158.95" - rpc = "100.116.158.95" - serf = "100.116.158.95" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true - bootstrap_expect = 3 -} - - - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/infrastructure/jobs/consul/jobs b/infrastructure/jobs/consul/jobs deleted file mode 120000 index 33e07e9..0000000 --- a/infrastructure/jobs/consul/jobs +++ /dev/null @@ -1 +0,0 @@ -components/consul/jobs/ \ No newline at end of file diff --git a/infrastructure/jobs/digitalocean-key-store.nomad b/infrastructure/jobs/digitalocean-key-store.nomad deleted file mode 100644 index 868e8a7..0000000 --- a/infrastructure/jobs/digitalocean-key-store.nomad +++ /dev/null @@ -1,37 +0,0 @@ -# DigitalOcean 密钥存储作业 -job "digitalocean-key-store" { - datacenters = ["dc1"] - type = "batch" - - group "key-store" { - task "store-key" { - driver = "exec" - - config { - command = "/bin/sh" - args = [ - "-c", - <Hybrid NFS App - Running on {{ env "attr.unique.hostname" }} -

Storage Type: {{ with eq (env "attr.unique.hostname") "semaphore" }}PVE Mount{{ else }}NFS{{ end }}

-

Timestamp: {{ now | date "2006-01-02 15:04:05" }}

-EOH - destination = "local/fnsync/index.html" - } - - resources { - cpu = 100 - memory = 128 - } - - service { - name = "hybrid-nfs-app" - port = "http" - - tags = ["hybrid", "nfs", "web"] - - check { - type = "http" - path = "/" - interval = "10s" - timeout = "2s" - } - } - } - } -} \ No newline at end of file diff --git a/infrastructure/jobs/nfs-app-example.nomad b/infrastructure/jobs/nfs-app-example.nomad deleted file mode 100644 index 4216aa8..0000000 --- a/infrastructure/jobs/nfs-app-example.nomad +++ /dev/null @@ -1,51 +0,0 @@ -job "nfs-app-example" { - datacenters = ["dc1"] - type = "service" - - group "app" { - count = 1 - - # 使用NFS存储卷 - volume "nfs-storage" { - type = "host" - read_only = false - source = "nfs-fnsync" - } - - task "web-app" { - driver = "docker" - - config { - image = "nginx:alpine" - ports = ["http"] - - # 挂载NFS卷到容器 - mount { - type = "volume" - target = "/usr/share/nginx/html" - source = "nfs-storage" - readonly = false - } - } - - resources { - cpu = 100 - memory = 128 - } - - service { - name = "nfs-web-app" - port = "http" - - tags = ["nfs", "web"] - - check { - type = "http" - path = "/" - interval = "10s" - timeout = "2s" - } - } - } - } -} \ No newline at end of file diff --git a/infrastructure/jobs/nfs-storage-test.nomad b/infrastructure/jobs/nfs-storage-test.nomad deleted file mode 100644 index 38f5f21..0000000 --- a/infrastructure/jobs/nfs-storage-test.nomad +++ /dev/null @@ -1,34 +0,0 @@ -job "nfs-storage-test" { - datacenters = ["dc1"] - type = "batch" - - group "test" { - count = 1 - - volume "nfs-storage" { - type = "csi" - read_only = false - source = "nfs-fnsync" - } - - task "storage-test" { - driver = "exec" - - volume_mount { - volume = "nfs-storage" - destination = "/mnt/nfs" - read_only = false - } - - config { - command = "/bin/sh" - args = ["-c", "echo 'NFS Storage Test - $(hostname) - $(date)' > /mnt/nfs/test-$(hostname).txt && ls -la /mnt/nfs/"] - } - - resources { - cpu = 50 - memory = 64 - } - } - } -} \ No newline at end of file diff --git a/infrastructure/jobs/nomad b/infrastructure/jobs/nomad deleted file mode 120000 index ed21927..0000000 --- a/infrastructure/jobs/nomad +++ /dev/null @@ -1 +0,0 @@ -components/nomad/jobs/ \ No newline at end of file diff --git a/infrastructure/jobs/nomad-nfs-multi-type.nomad b/infrastructure/jobs/nomad-nfs-multi-type.nomad deleted file mode 100644 index 1cb3b49..0000000 --- a/infrastructure/jobs/nomad-nfs-multi-type.nomad +++ /dev/null @@ -1,84 +0,0 @@ -job "nfs-multi-type-example" { - datacenters = ["dc1"] - type = "service" - - # 为本地LXC容器配置的任务组 - group "lxc-apps" { - count = 2 - - constraint { - attribute = "${attr.unique.hostname}" - operator = "regexp" - value = "(influxdb|hcp)" - } - - volume "lxc-nfs" { - type = "host" - source = "nfs-shared" - read_only = false - } - - task "lxc-app" { - driver = "podman" - - config { - image = "alpine:latest" - args = ["tail", "-f", "/dev/null"] - } - - volume_mount { - volume = "lxc-nfs" - destination = "/shared/lxc" - read_only = false - } - - resources { - cpu = 100 - memory = 64 - } - } - } - - # 为海外PVE容器配置的任务组 - group "pve-apps" { - count = 3 - - constraint { - attribute = "${attr.unique.hostname}" - operator = "regexp" - value = "(ash1d|ash2e|ash3c|ch2|ch3)" - } - - volume "pve-nfs" { - type = "host" - source = "nfs-shared" - read_only = false - } - - task "pve-app" { - driver = "podman" - - config { - image = "alpine:latest" - args = ["tail", "-f", "/dev/null"] - - # 为海外节点添加网络优化参数 - network_mode = "host" - } - - volume_mount { - volume = "pve-nfs" - destination = "/shared/pve" - read_only = false - } - - resources { - cpu = 100 - memory = 64 - network { - mbits = 5 - } - } - } - } -} \ No newline at end of file diff --git a/infrastructure/jobs/openfaas-functions.nomad b/infrastructure/jobs/openfaas-functions.nomad deleted file mode 100644 index 7235635..0000000 --- a/infrastructure/jobs/openfaas-functions.nomad +++ /dev/null @@ -1,86 +0,0 @@ -job "openfaas-functions" { - datacenters = ["dc1"] - type = "service" - - group "hello-world" { - count = 1 - - constraint { - attribute = "${node.unique.name}" - operator = "regexp" - value = "(master|ash3c|hcp)" - } - - task "hello-world" { - driver = "podman" - - config { - image = "functions/hello-world:latest" - ports = ["http"] - env = { - "fprocess" = "node index.js" - } - } - - resources { - network { - mbits = 10 - port "http" { static = 8080 } - } - } - - service { - name = "hello-world" - port = "http" - tags = ["openfaas-function"] - check { - type = "http" - path = "/" - interval = "10s" - timeout = "2s" - } - } - } - } - - group "figlet" { - count = 1 - - constraint { - attribute = "${node.unique.name}" - operator = "regexp" - value = "(master|ash3c|hcp)" - } - - task "figlet" { - driver = "podman" - - config { - image = "functions/figlet:latest" - ports = ["http"] - env = { - "fprocess" = "figlet" - } - } - - resources { - network { - mbits = 10 - port "http" { static = 8080 } - } - } - - service { - name = "figlet" - port = "http" - tags = ["openfaas-function"] - check { - type = "http" - path = "/" - interval = "10s" - timeout = "2s" - } - } - } - } -} \ No newline at end of file diff --git a/infrastructure/jobs/openfaas.nomad b/infrastructure/jobs/openfaas.nomad deleted file mode 100644 index 9c491c3..0000000 --- a/infrastructure/jobs/openfaas.nomad +++ /dev/null @@ -1,176 +0,0 @@ -job "openfaas" { - datacenters = ["dc1"] - type = "service" - - group "openfaas-gateway" { - count = 1 - - constraint { - attribute = "${node.unique.name}" - operator = "regexp" - value = "(master|ash3c|hcp)" - } - - task "openfaas-gateway" { - driver = "podman" - - config { - image = "ghcr.io/openfaas/gateway:0.2.35" - ports = ["http", "ui"] - env = { - "functions_provider_url" = "http://${NOMAD_IP_http}:8080" - "read_timeout" = "60s" - "write_timeout" = "60s" - "upstream_timeout" = "60s" - "direct_functions" = "true" - "faas_nats_address" = "nats://localhost:4222" - "faas_nats_streaming" = "true" - "basic_auth" = "true" - "secret_mount_path" = "/run/secrets" - "scale_from_zero" = "true" - } - } - - resources { - network { - mbits = 10 - port "http" { static = 8080 } - port "ui" { static = 8081 } - } - } - - service { - name = "openfaas-gateway" - port = "http" - check { - type = "http" - path = "/healthz" - interval = "10s" - timeout = "2s" - } - } - } - } - - group "nats" { - count = 1 - - constraint { - attribute = "${node.unique.name}" - operator = "regexp" - value = "(master|ash3c|hcp)" - } - - task "nats" { - driver = "podman" - - config { - image = "nats-streaming:0.25.3" - ports = ["nats"] - args = [ - "-p", - "4222", - "-m", - "8222", - "-hbi", - "5s", - "-hbt", - "5s", - "-hbf", - "2", - "-SD", - "-cid", - "openfaas" - ] - } - - resources { - network { - mbits = 10 - port "nats" { static = 4222 } - } - } - - service { - name = "nats" - port = "nats" - check { - type = "tcp" - interval = "10s" - timeout = "2s" - } - } - } - } - - group "queue-worker" { - count = 1 - - constraint { - attribute = "${node.unique.name}" - operator = "regexp" - value = "(master|ash3c|hcp)" - } - - task "queue-worker" { - driver = "podman" - - config { - image = "ghcr.io/openfaas/queue-worker:0.12.2" - env = { - "gateway_url" = "http://${NOMAD_IP_http}:8080" - "faas_nats_address" = "nats://localhost:4222" - "faas_nats_streaming" = "true" - "ack_wait" = "5m" - "write_debug" = "true" - } - } - - resources { - network { - mbits = 10 - } - } - } - } - - group "prometheus" { - count = 1 - - constraint { - attribute = "${node.unique.name}" - operator = "regexp" - value = "(master|ash3c|hcp)" - } - - task "prometheus" { - driver = "podman" - - config { - image = "prom/prometheus:v2.35.0" - ports = ["prometheus"] - volumes = [ - "/opt/openfaas/prometheus.yml:/etc/prometheus/prometheus.yml" - ] - } - - resources { - network { - mbits = 10 - port "prometheus" { static = 9090 } - } - } - - service { - name = "prometheus" - port = "prometheus" - check { - type = "http" - path = "/-/healthy" - interval = "10s" - timeout = "2s" - } - } - } - } -} \ No newline at end of file diff --git a/infrastructure/jobs/traefik.nomad b/infrastructure/jobs/traefik.nomad deleted file mode 100644 index b588b6c..0000000 --- a/infrastructure/jobs/traefik.nomad +++ /dev/null @@ -1,130 +0,0 @@ -job "traefik" { - datacenters = ["dc1"] - type = "service" - - update { - max_parallel = 1 - min_healthy_time = "10s" - healthy_deadline = "3m" - auto_revert = true - } - - group "traefik" { - count = 1 # 先在warden节点部署一个实例 - - # 约束只在warden节点运行 - constraint { - attribute = "${node.unique.name}" - operator = "=" - value = "bj-warden" - } - - restart { - attempts = 3 - interval = "30m" - delay = "15s" - mode = "fail" - } - - network { - port "http" { - static = 80 - } - port "https" { - static = 443 - } - port "api" { - static = 8080 - } - } - - task "traefik" { - driver = "exec" - - # 下载Traefik v3二进制文件 - artifact { - source = "https://github.com/traefik/traefik/releases/download/v3.1.5/traefik_v3.1.5_linux_amd64.tar.gz" - destination = "local/" - mode = "file" - options { - archive = "true" - } - } - - # 动态配置文件模板 - template { - data = </dev/null || \ - curl -s http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip -H "Metadata-Flavor: Google" 2>/dev/null || \ - ip route get 8.8.8.8 | awk '{print $7; exit}' || \ - hostname -I | awk '{print $1}') -else - BIND_ADDR="${bind_addr}" -fi - -log "检测到 IP 地址: $BIND_ADDR" - -# 创建 Nomad 配置文件 -log "创建 Nomad 配置文件..." -cat > /etc/nomad.d/nomad.hcl << EOF -datacenter = "${datacenter}" -region = "dc1" -data_dir = "/opt/nomad/data" - -bind_addr = "$BIND_ADDR" - -%{ if server_enabled } -server { - enabled = true - bootstrap_expect = ${bootstrap_expect} - encrypt = "${nomad_encrypt_key}" -} -%{ endif } - -%{ if client_enabled } -client { - enabled = true - - host_volume "podman-sock" { - path = "/run/podman/podman.sock" - read_only = false - } -} -%{ endif } - -ui { - enabled = true -} - -addresses { - http = "0.0.0.0" - rpc = "$BIND_ADDR" - serf = "$BIND_ADDR" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -plugin "podman" { - config { - volumes { - enabled = true - } - } -} - -telemetry { - collection_interval = "10s" - disable_hostname = false - prometheus_metrics = true - publish_allocation_metrics = true - publish_node_metrics = true -} - -log_level = "INFO" -log_file = "/var/log/nomad/nomad.log" -EOF - -# 创建 systemd 服务文件 -log "创建 systemd 服务文件..." -cat > /etc/systemd/system/nomad.service << EOF -[Unit] -Description=Nomad -Documentation=https://www.nomadproject.io/ -Requires=network-online.target -After=network-online.target -ConditionFileNotEmpty=/etc/nomad.d/nomad.hcl - -[Service] -Type=notify -User=nomad -Group=nomad -ExecStart=/usr/local/bin/nomad agent -config=/etc/nomad.d/nomad.hcl -ExecReload=/bin/kill -HUP \$MAINPID -KillMode=process -Restart=on-failure -LimitNOFILE=65536 - -[Install] -WantedBy=multi-user.target -EOF - -# 启动 Nomad 服务 -log "启动 Nomad 服务..." -systemctl daemon-reload -systemctl enable nomad -systemctl start nomad - -# 等待服务启动 -log "等待 Nomad 服务启动..." -sleep 10 - -# 验证安装 -log "验证 Nomad 安装..." -if systemctl is-active --quiet nomad; then - log "✅ Nomad 服务运行正常" - log "📊 节点信息:" - /usr/local/bin/nomad node status -self || true -else - log "❌ Nomad 服务启动失败" - systemctl status nomad --no-pager || true - journalctl -u nomad --no-pager -n 20 || true -fi - -# 配置防火墙(如果需要) -log "配置防火墙规则..." -if command -v ufw >/dev/null 2>&1; then - ufw allow 4646/tcp # HTTP API - ufw allow 4647/tcp # RPC - ufw allow 4648/tcp # Serf - ufw allow 22/tcp # SSH -fi - -# 创建有用的别名和脚本 -log "创建管理脚本..." -cat > /usr/local/bin/nomad-status << 'EOF' -#!/bin/bash -echo "=== Nomad 服务状态 ===" -systemctl status nomad --no-pager - -echo -e "\n=== Nomad 集群成员 ===" -nomad server members 2>/dev/null || echo "无法连接到集群" - -echo -e "\n=== Nomad 节点状态 ===" -nomad node status 2>/dev/null || echo "无法获取节点状态" - -echo -e "\n=== 最近日志 ===" -journalctl -u nomad --no-pager -n 5 -EOF - -chmod +x /usr/local/bin/nomad-status - -# 添加到 ubuntu 用户的 bashrc -echo 'alias ns="nomad-status"' >> /home/ubuntu/.bashrc -echo 'alias nomad-logs="journalctl -u nomad -f"' >> /home/ubuntu/.bashrc - -log "🎉 Nomad 节点配置完成!" -log "📍 数据中心: ${datacenter}" -log "🌐 IP 地址: $BIND_ADDR" -log "🔗 Web UI: http://$BIND_ADDR:4646" -log "📝 使用 'nomad-status' 或 'ns' 命令查看状态" - -# 输出重要信息到 motd -cat > /etc/update-motd.d/99-nomad << EOF -#!/bin/bash -echo "" -echo "🚀 Nomad 节点信息:" -echo " 数据中心: ${datacenter}" -echo " IP 地址: $BIND_ADDR" -echo " Web UI: http://$BIND_ADDR:4646" -echo " 状态检查: nomad-status" -echo "" -EOF - -chmod +x /etc/update-motd.d/99-nomad - -log "节点配置脚本执行完成" \ No newline at end of file diff --git a/infrastructure/opentofu/modules/nomad-cluster/templates/nomad-userdata.sh.backup.20250930_131639 b/infrastructure/opentofu/modules/nomad-cluster/templates/nomad-userdata.sh.backup.20250930_131639 deleted file mode 100644 index 417fff1..0000000 --- a/infrastructure/opentofu/modules/nomad-cluster/templates/nomad-userdata.sh.backup.20250930_131639 +++ /dev/null @@ -1,228 +0,0 @@ -#!/bin/bash -# Nomad 多数据中心节点自动配置脚本 -# 数据中心: ${datacenter} - -set -e - -# 日志函数 -log() { - echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a /var/log/nomad-setup.log -} - -log "开始配置 Nomad 节点 - 数据中心: ${datacenter}" - -# 更新系统 -log "更新系统包..." -apt-get update -y -apt-get upgrade -y - -# 安装必要的包 -log "安装必要的包..." -apt-get install -y \ - curl \ - wget \ - unzip \ - jq \ - podman \ - htop \ - net-tools \ - vim - -# 启动 Podman -log "启动 Podman 服务..." -systemctl enable podman -systemctl start podman -usermod -aG podman ubuntu - -# 安装 Nomad -log "安装 Nomad ${nomad_version}..." -cd /tmp -wget -q https://releases.hashicorp.com/nomad/${nomad_version}/nomad_${nomad_version}_linux_amd64.zip -unzip nomad_${nomad_version}_linux_amd64.zip -mv nomad /usr/local/bin/ -chmod +x /usr/local/bin/nomad - -# 创建 Nomad 用户和目录 -log "创建 Nomad 用户和目录..." -useradd --system --home /etc/nomad.d --shell /bin/false nomad -mkdir -p /opt/nomad/data -mkdir -p /etc/nomad.d -mkdir -p /var/log/nomad -chown -R nomad:nomad /opt/nomad /etc/nomad.d /var/log/nomad - -# 获取本机 IP 地址 -if [ "${bind_addr}" = "auto" ]; then - # 尝试多种方法获取 IP - BIND_ADDR=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4 2>/dev/null || \ - curl -s http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip -H "Metadata-Flavor: Google" 2>/dev/null || \ - ip route get 8.8.8.8 | awk '{print $7; exit}' || \ - hostname -I | awk '{print $1}') -else - BIND_ADDR="${bind_addr}" -fi - -log "检测到 IP 地址: $BIND_ADDR" - -# 创建 Nomad 配置文件 -log "创建 Nomad 配置文件..." -cat > /etc/nomad.d/nomad.hcl << EOF -datacenter = "${datacenter}" -region = "dc1" -data_dir = "/opt/nomad/data" - -bind_addr = "$BIND_ADDR" - -%{ if server_enabled } -server { - enabled = true - bootstrap_expect = ${bootstrap_expect} - encrypt = "${nomad_encrypt_key}" -} -%{ endif } - -%{ if client_enabled } -client { - enabled = true - - host_volume "podman-sock" { - path = "/run/podman/podman.sock" - read_only = false - } -} -%{ endif } - -ui { - enabled = true -} - -addresses { - http = "0.0.0.0" - rpc = "$BIND_ADDR" - serf = "$BIND_ADDR" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -plugin "podman" { - config { - volumes { - enabled = true - } - } -} - -telemetry { - collection_interval = "10s" - disable_hostname = false - prometheus_metrics = true - publish_allocation_metrics = true - publish_node_metrics = true -} - -log_level = "INFO" -log_file = "/var/log/nomad/nomad.log" -EOF - -# 创建 systemd 服务文件 -log "创建 systemd 服务文件..." -cat > /etc/systemd/system/nomad.service << EOF -[Unit] -Description=Nomad -Documentation=https://www.nomadproject.io/ -Requires=network-online.target -After=network-online.target -ConditionFileNotEmpty=/etc/nomad.d/nomad.hcl - -[Service] -Type=notify -User=nomad -Group=nomad -ExecStart=/usr/local/bin/nomad agent -config=/etc/nomad.d/nomad.hcl -ExecReload=/bin/kill -HUP \$MAINPID -KillMode=process -Restart=on-failure -LimitNOFILE=65536 - -[Install] -WantedBy=multi-user.target -EOF - -# 启动 Nomad 服务 -log "启动 Nomad 服务..." -systemctl daemon-reload -systemctl enable nomad -systemctl start nomad - -# 等待服务启动 -log "等待 Nomad 服务启动..." -sleep 10 - -# 验证安装 -log "验证 Nomad 安装..." -if systemctl is-active --quiet nomad; then - log "✅ Nomad 服务运行正常" - log "📊 节点信息:" - /usr/local/bin/nomad node status -self || true -else - log "❌ Nomad 服务启动失败" - systemctl status nomad --no-pager || true - journalctl -u nomad --no-pager -n 20 || true -fi - -# 配置防火墙(如果需要) -log "配置防火墙规则..." -if command -v ufw >/dev/null 2>&1; then - ufw allow 4646/tcp # HTTP API - ufw allow 4647/tcp # RPC - ufw allow 4648/tcp # Serf - ufw allow 22/tcp # SSH -fi - -# 创建有用的别名和脚本 -log "创建管理脚本..." -cat > /usr/local/bin/nomad-status << 'EOF' -#!/bin/bash -echo "=== Nomad 服务状态 ===" -systemctl status nomad --no-pager - -echo -e "\n=== Nomad 集群成员 ===" -nomad server members 2>/dev/null || echo "无法连接到集群" - -echo -e "\n=== Nomad 节点状态 ===" -nomad node status 2>/dev/null || echo "无法获取节点状态" - -echo -e "\n=== 最近日志 ===" -journalctl -u nomad --no-pager -n 5 -EOF - -chmod +x /usr/local/bin/nomad-status - -# 添加到 ubuntu 用户的 bashrc -echo 'alias ns="nomad-status"' >> /home/ubuntu/.bashrc -echo 'alias nomad-logs="journalctl -u nomad -f"' >> /home/ubuntu/.bashrc - -log "🎉 Nomad 节点配置完成!" -log "📍 数据中心: ${datacenter}" -log "🌐 IP 地址: $BIND_ADDR" -log "🔗 Web UI: http://$BIND_ADDR:4646" -log "📝 使用 'nomad-status' 或 'ns' 命令查看状态" - -# 输出重要信息到 motd -cat > /etc/update-motd.d/99-nomad << EOF -#!/bin/bash -echo "" -echo "🚀 Nomad 节点信息:" -echo " 数据中心: ${datacenter}" -echo " IP 地址: $BIND_ADDR" -echo " Web UI: http://$BIND_ADDR:4646" -echo " 状态检查: nomad-status" -echo "" -EOF - -chmod +x /etc/update-motd.d/99-nomad - -log "节点配置脚本执行完成" \ No newline at end of file diff --git a/infrastructure/routes/traefik.yml b/infrastructure/routes/traefik.yml deleted file mode 100644 index aaff96e..0000000 --- a/infrastructure/routes/traefik.yml +++ /dev/null @@ -1,54 +0,0 @@ -# Traefik静态配置文件 -global: - sendAnonymousUsage: false - -# API和仪表板配置 -api: - dashboard: true - insecure: true # 仅用于测试,生产环境应使用安全配置 - -# 入口点配置 -entryPoints: - http: - address: ":80" - # 重定向HTTP到HTTPS - http: - redirections: - entryPoint: - to: https - scheme: https - https: - address: ":443" - api: - address: ":8080" - -# 提供者配置 -providers: - # 启用文件提供者用于动态配置 - file: - directory: "/etc/traefik/dynamic" - watch: true - - # Nomad提供者 - 使用静态地址因为Nomad API相对稳定 - nomad: - exposedByDefault: false - prefix: "traefik" - refreshInterval: 15s - stale: false - watch: true - endpoint: - address: "http://127.0.0.1:4646" - scheme: "http" - allowEmptyServices: true - -# 日志配置 -log: - level: "INFO" - format: "json" - -accessLog: - format: "json" - fields: - defaultMode: "keep" - headers: - defaultMode: "keep" \ No newline at end of file diff --git a/lxc_chrome_automation_config.md b/lxc_chrome_automation_config.md deleted file mode 100644 index 7eb9fd0..0000000 --- a/lxc_chrome_automation_config.md +++ /dev/null @@ -1,294 +0,0 @@ -# LXC 容器浏览器自动化环境配置方案 - -## 1. LXC 容器基础配置 - -```bash -# 创建 Ubuntu 22.04 基础容器 -lxc launch ubuntu:22.04 chrome-automation - -# 配置容器资源限制 -lxc config set chrome-automation limits.cpu 2 -lxc config set chrome-automation limits.memory 4GB - -# 映射端口(如果需要外部访问) -lxc config device add chrome-automation proxy-port8080 proxy listen=tcp:0.0.0.0:8080 connect=tcp:127.0.0.1:8080 -``` - -## 2. 容器内环境配置 - -### 2.1 基础系统包安装 -```bash -# 进入容器 -lxc exec chrome-automation -- bash - -# 更新系统 -apt update && apt upgrade -y - -# 安装基础开发工具和图形支持 -apt install -y \ - curl \ - wget \ - unzip \ - git \ - build-essential \ - xvfb \ - x11-utils \ - x11-xserver-utils \ - xdg-utils \ - libnss3 \ - libatk-bridge2.0-0 \ - libdrm2 \ - libxkbcommon0 \ - libxcomposite1 \ - libxdamage1 \ - libxrandr2 \ - libgbm1 \ - libxss1 \ - libasound2 \ - fonts-liberation \ - libappindicator3-1 \ - xdg-utils \ - libsecret-1-dev \ - libgconf-2-4 -``` - -### 2.2 安装 Chrome 浏览器 -```bash -# 下载并安装 Google Chrome -wget -q -O - https://dl.google.com/linux/linux_signing_key.pub | apt-key add - -echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list -apt update -apt install -y google-chrome-stable -``` - -### 2.3 安装浏览器自动化工具 -```bash -# 安装 Node.js 和 npm -curl -fsSL https://deb.nodesource.com/setup_18.x | bash - -apt install -y nodejs - -# 安装 Python 和相关工具 -apt install -y python3 python3-pip python3-venv - -# 安装 Selenium 和浏览器驱动 -pip3 install selenium webdriver-manager - -# 下载 ChromeDriver -npm install -g chromedriver -``` - -### 2.4 配置无头模式运行环境 -```bash -# 创建自动化脚本目录 -mkdir -p /opt/browser-automation -cd /opt/browser-automation - -# 创建 Chrome 无头模式启动脚本 -cat > chrome-headless.sh << 'EOF' -#!/bin/bash -export DISPLAY=:99 -Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 & -sleep 2 -google-chrome --headless --no-sandbox --disable-dev-shm-usage --disable-gpu --remote-debugging-port=9222 --disable-extensions --disable-plugins --disable-images & -sleep 3 -EOF - -chmod +x chrome-headless.sh -``` - -## 3. 自动化工具配置 - -### 3.1 Python Selenium 配置示例 -```python -# selenium_automation.py -from selenium import webdriver -from selenium.webdriver.chrome.options import Options -from selenium.webdriver.chrome.service import Service -from webdriver_manager.chrome import ChromeDriverManager - -def create_chrome_driver(): - chrome_options = Options() - chrome_options.add_argument("--headless") - chrome_options.add_argument("--no-sandbox") - chrome_options.add_argument("--disable-dev-shm-usage") - chrome_options.add_argument("--disable-gpu") - chrome_options.add_argument("--remote-debugging-port=9222") - chrome_options.add_argument("--disable-extensions") - chrome_options.add_argument("--disable-plugins") - chrome_options.add_argument("--window-size=1920,1080") - - service = Service(ChromeDriverManager().install()) - driver = webdriver.Chrome(service=service, options=chrome_options) - return driver - -# 使用示例 -driver = create_chrome_driver() -driver.get("https://www.example.com") -print(driver.title) -driver.quit() -``` - -### 3.2 Node.js Puppeteer 配置示例 -```javascript -// puppeteer_automation.js -const puppeteer = require('puppeteer'); - -async function runAutomation() { - const browser = await puppeteer.launch({ - headless: true, - args: [ - '--no-sandbox', - '--disable-setuid-sandbox', - '--disable-dev-shm-usage', - '--disable-gpu', - '--window-size=1920,1080' - ] - }); - - const page = await browser.newPage(); - await page.goto('https://www.example.com'); - const title = await page.title(); - console.log(title); - - await browser.close(); -} - -runAutomation(); -``` - -## 4. 容器启动配置 - -### 4.1 启动脚本 -```bash -cat > /opt/browser-automation/start.sh << 'EOF' -#!/bin/bash - -# 启动 Xvfb 虚拟显示 -export DISPLAY=:99 -Xvfb :99 -screen 0 1024x768x24 > /dev/null 2>&1 & -sleep 2 - -# 启动 Chrome 浏览器 -google-chrome --headless --no-sandbox --disable-dev-shm-usage --disable-gpu --remote-debugging-port=9222 --disable-extensions --disable-plugins --disable-images & -sleep 3 - -# 可选:启动自动化服务 -# python3 /opt/browser-automation/service.py - -echo "Browser automation environment ready!" -EOF - -chmod +x /opt/browser-automation/start.sh -``` - -### 4.2 系统服务配置 -```bash -cat > /etc/systemd/system/browser-automation.service << 'EOF' -[Unit] -Description=Browser Automation Service -After=network.target - -[Service] -Type=forking -ExecStart=/opt/browser-automation/start.sh -Restart=always -User=root -Environment=DISPLAY=:99 - -[Install] -WantedBy=multi-user.target -EOF - -systemctl enable browser-automation.service -``` - -## 5. 安全配置 - -### 5.1 非 root 用户配置 -```bash -# 创建专用用户 -useradd -m -s /bin/bash browser-user -usermod -a -G sudo browser-user - -# 设置 Chrome 以非 root 用户运行 -echo 'chrome --no-sandbox --user-data-dir=/home/browser-user/.config/google-chrome' > /home/browser-user/run-chrome.sh -chown browser-user:browser-user /home/browser-user/run-chrome.sh -``` - -### 5.2 网络安全 -```bash -# 配置防火墙(如果需要) -ufw allow 22/tcp -# 仅在需要外部访问时开放特定端口 -# ufw allow 8080/tcp -``` - -## 6. 监控和日志 - -### 6.1 日志配置 -```bash -# 创建日志目录 -mkdir -p /var/log/browser-automation - -# 配置日志轮转 -cat > /etc/logrotate.d/browser-automation << 'EOF' -/var/log/browser-automation/*.log { - daily - missingok - rotate 30 - compress - delaycompress - notifempty - create 644 root root -} -EOF -``` - -## 7. 备份和恢复 - -### 7.1 创建容器快照 -```bash -# 创建快照 -lxc snapshot chrome-automation initial-setup - -# 列出快照 -lxc info chrome-automation --snapshots - -# 恢复快照 -lxc restore chrome-automation initial-setup -``` - -### 7.2 配置文件备份 -```bash -# 备份重要配置 -lxc file pull chrome-automation/etc/systemd/system/browser-automation.service ./ -lxc file pull chrome-automation/opt/browser-automation/start.sh ./ -``` - -## 8. 性能优化 - -### 8.1 Chrome 启动参数优化 -```bash -CHROME_OPTS="--headless \ ---no-sandbox \ ---disable-dev-shm-usage \ ---disable-gpu \ ---remote-debugging-port=9222 \ ---disable-extensions \ ---disable-plugins \ ---disable-images \ ---disable-javascript \ ---memory-pressure-off \ ---max_old_space_size=4096 \ ---js-flags=--max-old-space-size=2048" -``` - -### 8.2 容器资源优化 -```bash -# 在容器配置中设置资源限制 -lxc config set chrome-automation limits.cpu 2 -lxc config set chrome-automation limits.memory 4GB -lxc config set chrome-automation limits.memory.swap false -``` - -这个配置方案提供了完整的LXC容器环境,专门用于浏览器自动化任务,具有良好的性能、安全性和可维护性。 \ No newline at end of file diff --git a/nomad-test.hcl b/nomad-test.hcl deleted file mode 100644 index e30933d..0000000 --- a/nomad-test.hcl +++ /dev/null @@ -1,50 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "semaphore" - -bind_addr = "192.168.31.149" - -addresses { - http = "192.168.31.149" - rpc = "192.168.31.149" - serf = "192.168.31.149" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true - bootstrap_expect = 3 - retry_join = ["semaphore", "ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"] -} - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "master:8500,ash3c:8500,warden:8500" -} - -vault { - enabled = true - address = "http://master:8200,http://ash3c:8200,http://warden:8200" - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/nomad.hcl.corrected b/nomad.hcl.corrected deleted file mode 100644 index 1d62599..0000000 --- a/nomad.hcl.corrected +++ /dev/null @@ -1,50 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "ch3" - -bind_addr = "100.116.158.95" - -addresses { - http = "100.116.158.95" - rpc = "100.116.158.95" - serf = "100.116.158.95" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true - bootstrap_expect = 3 - retry_join = ["ash1d", "ash2e", "ch2", "ch3", "onecloud1", "de"] -} - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "master:8500,ash3c:8500,warden:8500" -} - -vault { - enabled = true - address = "http://master:8200,http://ash3c:8200,http://warden:8200" - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/nomad.hcl.updated b/nomad.hcl.updated deleted file mode 100644 index 0e92ec7..0000000 --- a/nomad.hcl.updated +++ /dev/null @@ -1,50 +0,0 @@ -datacenter = "dc1" -data_dir = "/opt/nomad/data" -plugin_dir = "/opt/nomad/plugins" -log_level = "INFO" -name = "ch3" - -bind_addr = "100.86.141.112" - -addresses { - http = "100.86.141.112" - rpc = "100.86.141.112" - serf = "100.86.141.112" -} - -ports { - http = 4646 - rpc = 4647 - serf = 4648 -} - -server { - enabled = true - bootstrap_expect = 3 - retry_join = ["100.81.26.3", "100.103.147.94", "100.90.159.68", "100.86.141.112", "100.98.209.50", "100.120.225.29"] -} - -client { - enabled = false -} - -plugin "nomad-driver-podman" { - config { - socket_path = "unix:///run/podman/podman.sock" - volumes { - enabled = true - } - } -} - -consul { - address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden -} - -vault { - enabled = true - address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden - token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" - create_from_role = "nomad-cluster" - tls_skip_verify = true -} \ No newline at end of file diff --git a/nomad_expired_nodes_final_report.md b/nomad_expired_nodes_final_report.md deleted file mode 100644 index ef994ab..0000000 --- a/nomad_expired_nodes_final_report.md +++ /dev/null @@ -1,56 +0,0 @@ -# Nomad过期客户端节点处理最终报告 - -## 概述 -根据您的要求,我们已经对Nomad集群中三个过期的客户端节点进行了处理。这些节点处于"down"状态,我们采取了多项措施来加速它们的移除。 - -## 已处理的节点 -1. **bj-semaphore** (ID: fa91f05f) -2. **kr-ch2** (ID: 369f60be) -3. **kr-ch3** (ID: 3bd9e893) - -## 已执行操作总结 -1. **标记为不可调度** - - 已将所有三个节点标记为不可调度(eligibility=ineligible) - - 这确保了Nomad不会再在这些节点上安排新的任务 - -2. **强制排水操作** - - 对所有三个节点执行了强制排水操作 - - 命令: `nomad node drain -address=http://100.86.141.112:4646 -enable -force ` - - 结果: 所有节点的排水操作都已完成 - -3. **API删除尝试** - - 尝试通过Nomad API直接删除节点 - - 使用curl命令发送DELETE请求到Nomad API - -4. **服务器节点重启** - - 重启了部分Nomad服务器节点以强制重新评估集群状态 - - 重启的节点: ash1d.global.global, ch2.global.global - - 集群保持稳定,没有出现服务中断 - -## 当前状态 -尽管采取了上述措施,这些节点仍然显示在节点列表中,但状态已更新为不可调度且已完成排水: -``` -ID Node Pool DC Name Class Drain Eligibility Status -369f60be default dc1 kr-ch2 false ineligible down -3bd9e893 default dc1 kr-ch3 false ineligible down -fa91f05f default dc1 bj-semaphore false ineligible down -``` - -## 分析与建议 -### 为什么节点仍未被移除? -1. Nomad默认会在72小时后自动清理down状态的节点 -2. 这些节点可能在后端存储(如本地磁盘或Consul)中仍有状态信息 -3. 由于它们已经处于down状态且被标记为不可调度,不会对集群造成影响 - -### 进一步建议 -1. **等待自动清理**: 最安全的方法是等待Nomad自动清理这些节点(默认72小时) -2. **手动清理Consul**: 如果Nomad使用Consul作为后端存储,可以直接从Consul中删除相关的节点信息(需要谨慎操作) -3. **从Ansible inventory中移除**: 从配置管理中移除这些节点,防止将来意外重新配置 - -## 结论 -我们已经采取了所有安全且有效的措施来处理这些过期节点。目前它们已被标记为不可调度且已完成排水,不会对集群造成任何影响。建议等待Nomad自动清理这些节点,或者如果确实需要立即移除,可以从Ansible inventory中移除这些节点定义。 - -## 后续步骤 -1. 监控集群状态,确保这些节点不会对集群造成影响 -2. 如果在接下来的几天内这些节点仍未被自动清理,可以考虑更激进的手动清理方法 -3. 更新相关文档,记录这些节点已被退役 \ No newline at end of file diff --git a/nomad_expired_nodes_handling_summary.md b/nomad_expired_nodes_handling_summary.md deleted file mode 100644 index 67e287f..0000000 --- a/nomad_expired_nodes_handling_summary.md +++ /dev/null @@ -1,54 +0,0 @@ -# Nomad过期客户端节点处理总结 - -## 任务目标 -移除Nomad集群中三个已过期的客户端节点: -1. bj-semaphore (ID: fa91f05f) -2. kr-ch2 (ID: 369f60be) -3. kr-ch3 (ID: 3bd9e893) - -## 已完成操作 - -### 1. 标记节点为不可调度 -``` -nomad node eligibility -address=http://100.86.141.112:4646 -disable fa91f05f -nomad node eligibility -address=http://100.86.141.112:4646 -disable 369f60be -nomad node eligibility -address=http://100.86.141.112:4646 -disable 3bd9e893 -``` - -### 2. 强制排水操作 -``` -nomad node drain -address=http://100.86.141.112:4646 -enable -force fa91f05f -nomad node drain -address=http://100.86.141.112:4646 -enable -force 369f60be -nomad node drain -address=http://100.86.141.112:4646 -enable -force 3bd9e893 -``` - -### 3. API删除尝试 -``` -curl -X DELETE http://100.86.141.112:4646/v1/node/fa91f05f-80d7-1b10-a879-a54ba2fb943f -curl -X DELETE http://100.86.141.112:4646/v1/node/369f60be-2640-93f2-94f5-fe95907d0462 -curl -X DELETE http://100.86.141.112:4646/v1/node/3bd9e893-aef4-b732-6c07-63739601ccde -``` - -### 4. 服务器节点重启 -- 重启了 ash1d.global.global 节点 -- 重启了 ch2.global.global 节点 -- 集群保持稳定运行 - -### 5. 配置管理更新 -- 从Ansible inventory文件中注释掉了过期节点: - - ch2 (kr-ch2) - - ch3 (kr-ch3) - - semaphoressh (bj-semaphore) - -## 当前状态 -节点仍然显示在Nomad集群节点列表中,但已被标记为不可调度且已完成排水,不会对集群造成影响。 - -## 后续建议 -1. 等待Nomad自动清理(默认72小时后) -2. 监控集群状态确保正常运行 -3. 如有需要,可考虑更激进的手动清理方法 - -## 相关文档 -- 详细操作报告: nomad_expired_nodes_final_report.md -- 重启备份计划: nomad_restart_backup_plan.md -- 移除操作报告: nomad_expired_nodes_removal_report.md \ No newline at end of file diff --git a/nomad_expired_nodes_removal_report.md b/nomad_expired_nodes_removal_report.md deleted file mode 100644 index 447c15a..0000000 --- a/nomad_expired_nodes_removal_report.md +++ /dev/null @@ -1,45 +0,0 @@ -# Nomad过期客户端节点处理报告 - -## 概述 -根据您的要求,已处理Nomad集群中三个过期的客户端节点。这些节点处于"down"状态,我们已经采取了多项措施来加速它们的移除。 - -## 已处理的节点 -1. **bj-semaphore** (ID: fa91f05f) -2. **kr-ch2** (ID: 369f60be) -3. **kr-ch3** (ID: 3bd9e893) - -## 已执行操作 -1. 已将所有三个节点标记为不可调度(eligibility=ineligible) - - 这确保了Nomad不会再在这些节点上安排新的任务 - - 命令: `nomad node eligibility -address=http://100.86.141.112:4646 -disable ` - -2. 对所有三个节点执行了强制排水操作 - - 命令: `nomad node drain -address=http://100.86.141.112:4646 -enable -force ` - - 结果: 所有节点的排水操作都已完成 - -3. 尝试通过API直接删除节点 - - 使用curl命令发送DELETE请求到Nomad API - - 命令: `curl -X DELETE http://100.86.141.112:4646/v1/node/` - -## 当前状态 -节点仍然显示在列表中,但状态已更新: -``` -ID Node Pool DC Name Class Drain Eligibility Status -369f60be default dc1 kr-ch2 false ineligible down -3bd9e893 default dc1 kr-ch3 false ineligible down -fa91f05f default dc1 bj-semaphore false ineligible down -``` - -## 进一步建议 -如果需要立即完全移除这些节点,可以考虑以下方法: - -1. **重启Nomad服务器**: 重启Nomad服务器将强制重新评估所有节点状态,通常会清除已失效的节点 - - 注意:这可能会导致短暂的服务中断 - -2. **手动清理Consul中的节点信息**: 如果Nomad使用Consul作为后端存储,可以直接从Consul中删除相关的节点信息 - - 需要谨慎操作,避免影响其他正常节点 - -3. **等待自动清理**: Nomad默认会在72小时后自动清理down状态的节点 - -## 结论 -我们已经采取了所有可能的措施来加速移除这些过期节点。目前它们已被标记为不可调度且已完成排水,不会对集群造成影响。如果需要立即完全移除,建议重启Nomad服务器。 \ No newline at end of file diff --git a/nomad_restart_backup_plan.md b/nomad_restart_backup_plan.md deleted file mode 100644 index fe4278e..0000000 --- a/nomad_restart_backup_plan.md +++ /dev/null @@ -1,42 +0,0 @@ -# Nomad服务器重启备份计划 - -## 概述 -此文档提供了在重启Nomad服务器以清理过期节点时的备份计划和恢复步骤。 - -## 重启前检查清单 -1. 确认当前集群状态 -2. 记录当前运行的作业和分配 -3. 确认所有重要服务都有适当的冗余 -4. 通知相关团队即将进行的维护 - -## 重启步骤 -1. 选择一个非领导者服务器首先重启 -2. 等待服务器完全恢复并重新加入集群 -3. 验证集群健康状态 -4. 继续重启其他服务器节点 -5. 最后重启领导者节点 - -## 领导者节点重启步骤 -1. 确保至少有3个服务器节点在线以维持仲裁 -2. 在领导者节点上执行: `systemctl restart nomad` -3. 等待服务重新启动 -4. 验证节点是否已重新加入集群 -5. 检查过期节点是否已被清理 - -## 回滚计划 -如果重启后出现任何问题: -1. 检查Nomad日志: `journalctl -u nomad -f` -2. 验证配置文件是否正确 -3. 如果必要,从备份恢复配置文件 -4. 联系团队成员协助解决问题 - -## 验证步骤 -1. 检查集群状态: `nomad node status` -2. 验证所有重要作业仍在运行 -3. 确认新作业可以正常调度 -4. 检查监控系统是否有异常报警 - -## 联系人 -- 主要联系人: [您的姓名] -- 备份联系人: [备份人员姓名] -- 紧急联系电话: [电话号码] \ No newline at end of file diff --git a/ops_journal.md b/ops_journal.md deleted file mode 100644 index 8514236..0000000 --- a/ops_journal.md +++ /dev/null @@ -1,67 +0,0 @@ -# 🎯 HashiCorp Stack 运维集思录 - -## 📍 关键里程碑记录 - -### ✅ 2025-09-30 标志性成功 -**Nomad完全恢复正常运行** -- **成功指标**: - - Nomad server集群: 7个节点全部在线 (ch2.global为leader) - - Nomad client节点: 6个节点全部ready状态 - - 服务状态: nomad服务运行正常 -- **关键操作**: 恢复了Nomad的consul配置 (`address = "master:8500,ash3c:8500,warden:8500"`) - ---- - -### ❌ 当前大失败 -**Vault job无法部署到bj-warden节点** -- **失败现象**: - ``` - * Constraint "${node.unique.name} = bj-warden": 5 nodes excluded by filter - * Constraint "${attr.consul.version} semver >= 1.8.0": 1 nodes excluded by filter - ``` -- **根本原因发现**: consul-cluster job约束条件为 `(master|ash3c|hcp)`,**warden节点被排除在外**! -- **历史教训**: 之前通过移除service块让vault独立运行,但这导致vault无法与consul集成,项目失去意义 -- **深层问题**: 不是consul没运行,而是**根本不允许在warden节点运行consul**! - ---- - -## 🎯 核心矛盾 -**Vault必须与Consul集成** ←→ **bj-warden节点没有consul** - -### 🎯 新思路:给Nomad节点打consul标签 -**用户建议**: 给所有运行consul的nomad节点打上标签标识 -- **优势**: 优雅、可扩展、符合Nomad范式 -- **实施路径**: - 1. 给master、ash3c等已有consul节点打标签 `consul=true` - 2. 修改vault job约束条件,选择有consul标签的节点 - 3. 可选:给warden节点也打标签,后续部署consul到该节点 - ---- - -### 🔍 当前发现 -- 所有节点Attributes为null,说明Nomad客户端配置可能有问题 -- 用nomad拉起consul不能自动让节点具备consul属性 -- **重大发现**:nomad node status -verbose 和 -json 输出格式数据不一致! - - verbose模式显示Meta中有"consul = true" - - JSON格式显示Meta为null - - 可能是Nomad的bug或数据同步问题 - -### 🎯 下一步行动 -1. **调查Attributes为null的原因** - 检查Nomad客户端配置 -2. **考虑用ansible部署consul** - 确保consul作为系统服务运行 -3. **验证meta数据一致性** - 解决verbose和json格式数据不一致问题 -4. **重新思考节点标签策略** - 基于实际可用的数据格式制定策略 - ---- - -## 📋 待办清单 -- [ ] 检查bj-warden节点的consul配置 -- [ ] 在bj-warden节点启动consul服务 -- [ ] 验证vault job成功部署 -- [ ] 确认vault与consul集成正常 - ---- - -## 🚫 禁止操作 -- ❌ 移除vault job的service块 (会导致失去consul集成) -- ❌ 忽略consul版本约束 (会导致兼容性问题) \ No newline at end of file diff --git a/scripts/README.md b/scripts/README.md deleted file mode 100755 index 3b6dd5f..0000000 --- a/scripts/README.md +++ /dev/null @@ -1,72 +0,0 @@ -# 脚本目录结构说明 - -本目录包含项目中所有的脚本文件,按功能分类组织。 - -## 目录结构 - -``` -scripts/ -├── README.md # 本说明文件 -├── setup/ # 环境设置和初始化脚本 -│ ├── init/ # 初始化脚本 -│ ├── config/ # 配置生成脚本 -│ └── environment/ # 环境设置脚本 -├── deployment/ # 部署相关脚本 -│ ├── vault/ # Vault部署脚本 -│ ├── consul/ # Consul部署脚本 -│ ├── nomad/ # Nomad部署脚本 -│ └── infrastructure/ # 基础设施部署脚本 -├── testing/ # 测试脚本 -│ ├── unit/ # 单元测试 -│ ├── integration/ # 集成测试 -│ ├── mcp/ # MCP服务器测试 -│ └── infrastructure/ # 基础设施测试 -├── utilities/ # 工具脚本 -│ ├── backup/ # 备份相关 -│ ├── monitoring/ # 监控相关 -│ ├── maintenance/ # 维护相关 -│ └── helpers/ # 辅助工具 -├── mcp/ # MCP服务器相关脚本 -│ ├── servers/ # MCP服务器实现 -│ ├── configs/ # MCP配置脚本 -│ └── tools/ # MCP工具脚本 -└── ci-cd/ # CI/CD相关脚本 - ├── build/ # 构建脚本 - ├── deploy/ # 部署脚本 - └── quality/ # 代码质量检查脚本 -``` - -## 脚本命名规范 - -- 使用小写字母和连字符分隔 -- 功能明确的前缀: - - `init-` : 初始化脚本 - - `deploy-` : 部署脚本 - - `test-` : 测试脚本 - - `backup-` : 备份脚本 - - `monitor-` : 监控脚本 - - `setup-` : 设置脚本 - -## 使用说明 - -1. 所有脚本都应该有执行权限 -2. 脚本应该包含适当的错误处理 -3. 重要操作前应该有确认提示 -4. 脚本应该支持 `--help` 参数显示使用说明 - -## 快速访问 - -常用脚本的快速访问方式: - -```bash -# 测试相关 -make test # 运行所有测试 -./scripts/testing/mcp/test-all-mcp-servers.sh - -# 部署相关 -./scripts/deployment/vault/deploy-vault-dev.sh -./scripts/deployment/consul/deploy-consul-cluster.sh - -# 工具相关 -./scripts/utilities/backup/backup-all.sh -./scripts/utilities/monitoring/health-check.sh \ No newline at end of file diff --git a/scripts/SCRIPT_INDEX.md b/scripts/SCRIPT_INDEX.md deleted file mode 100755 index b8af707..0000000 --- a/scripts/SCRIPT_INDEX.md +++ /dev/null @@ -1,113 +0,0 @@ -# 脚本索引 - -本文件列出了所有已整理的脚本及其功能说明。 - -## 设置和初始化脚本 (setup/) - -### 初始化脚本 (setup/init/) -- `init-vault-dev.sh` - 初始化开发环境的 Vault -- `init-vault-dev-api.sh` - 通过 API 初始化开发环境的 Vault -- `init-vault-cluster.sh` - 初始化 Vault 集群 - -### 配置生成脚本 (setup/config/) -- `setup-consul-cluster-variables.sh` - 设置 Consul 集群变量 -- `setup-consul-variables-and-storage.sh` - 设置 Consul 变量和存储 -- `generate-consul-config.sh` - 生成 Consul 配置文件 - -## 部署脚本 (deployment/) - -### Vault 部署 (deployment/vault/) -- `deploy-vault.sh` - 部署 Vault -- `vault-dev-example.sh` - Vault 开发环境示例 -- `vault-dev-quickstart.sh` - Vault 开发环境快速启动 - -### Consul 部署 (deployment/consul/) -- `deploy-consul-cluster-kv.sh` - 部署 Consul 集群(使用 KV 存储) -- `consul-variables-example.sh` - Consul 变量示例 - -## 测试脚本 (testing/) - -### 主测试运行器 (testing/) -- `test-runner.sh` - 主测试运行器 - -### 集成测试 (testing/integration/) -- `verify-vault-consul-integration.sh` - 验证 Vault-Consul 集成 - -### 基础设施测试 (testing/infrastructure/) -- `test-nomad-config.sh` - 测试 Nomad 配置 -- `test-traefik-deployment.sh` - 测试 Traefik 部署 - -### MCP 测试 (testing/mcp/) -- `test_direct_search.sh` - 直接搜索测试 -- `test_local_mcp_servers.sh` - 本地 MCP 服务器测试 -- `test_mcp_interface.sh` - MCP 接口测试 -- `test_mcp_search_final.sh` - MCP 搜索最终测试 -- `test_mcp_servers.sh` - MCP 服务器测试 -- `test_qdrant_ollama_tools.sh` - Qdrant Ollama 工具测试 -- `test_qdrant_ollama_tools_fixed.sh` - Qdrant Ollama 工具修复测试 -- `test_search_documents.sh` - 搜索文档测试 -- `test_mcp_servers_comprehensive.py` - MCP 服务器综合测试(Python) -- `test_mcp_servers_improved.py` - MCP 服务器改进测试(Python) -- `test_mcp_servers_simple.py` - MCP 服务器简单测试(Python) -- `test_qdrant_ollama_server.py` - Qdrant Ollama 服务器测试(Python) - -## 工具脚本 (utilities/) - -### 备份工具 (utilities/backup/) -- `backup-consul.sh` - 备份 Consul 数据 - -### 维护工具 (utilities/maintenance/) -- `cleanup-global-config.sh` - 清理全局配置 - -### 辅助工具 (utilities/helpers/) -- `show-vault-dev-keys.sh` - 显示 Vault 开发环境密钥 -- `nomad-leader-discovery.sh` - Nomad 领导者发现 -- `manage-vault-consul.sh` - 管理 Vault-Consul -- `fix-alpine-cgroups.sh` - 修复 Alpine cgroups -- `fix-alpine-cgroups-systemd.sh` - 修复 Alpine cgroups(systemd) - -## MCP 相关脚本 (mcp/) - -### MCP 服务器 (mcp/servers/) -- `qdrant-mcp-server.py` - Qdrant MCP 服务器 -- `qdrant-ollama-integration.py` - Qdrant Ollama 集成 -- `qdrant-ollama-mcp-server.py` - Qdrant Ollama MCP 服务器 - -### MCP 配置 (mcp/configs/) -- `sync-all-configs.sh` - 同步所有 MCP 配置 - -### MCP 工具 (mcp/tools/) -- `start-mcp-server.sh` - 启动 MCP 服务器 - -## 使用说明 - -### 快速启动命令 - -```bash -# 运行所有测试 -./scripts/testing/test-runner.sh - -# 初始化开发环境 -./scripts/setup/init/init-vault-dev.sh - -# 部署 Consul 集群 -./scripts/deployment/consul/deploy-consul-cluster-kv.sh - -# 启动 MCP 服务器 -./scripts/mcp/tools/start-mcp-server.sh - -# 备份 Consul -./scripts/utilities/backup/backup-consul.sh -``` - -### 权限设置 - -确保所有脚本都有执行权限: - -```bash -find scripts/ -name "*.sh" -exec chmod +x {} \; -``` - -### 环境变量 - -某些脚本可能需要特定的环境变量,请参考各脚本的注释说明。 \ No newline at end of file diff --git a/scripts/ci-cd/build/generate-docs.sh b/scripts/ci-cd/build/generate-docs.sh deleted file mode 100755 index 1b6bd60..0000000 --- a/scripts/ci-cd/build/generate-docs.sh +++ /dev/null @@ -1,178 +0,0 @@ -#!/bin/bash - -# 文档生成脚本 -# 自动生成项目文档 - -set -euo pipefail - -# 颜色定义 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -BLUE='\033[0;34m' -NC='\033[0m' # No Color - -# 日志函数 -log_info() { - echo -e "${BLUE}[INFO]${NC} $1" -} - -log_success() { - echo -e "${GREEN}[SUCCESS]${NC} $1" -} - -log_warning() { - echo -e "${YELLOW}[WARNING]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 生成脚本文档 -generate_script_docs() { - log_info "生成脚本文档..." - - local doc_file="docs/SCRIPTS.md" - mkdir -p "$(dirname "$doc_file")" - - cat > "$doc_file" << 'EOF' -# 脚本文档 - -本文档自动生成,包含项目中所有脚本的说明。 - -## 脚本列表 - -EOF - - # 遍历脚本目录 - find scripts/ -name "*.sh" -type f | sort | while read -r script; do - echo "### $script" >> "$doc_file" - echo "" >> "$doc_file" - - # 提取脚本描述(从注释中) - local description - description=$(head -n 10 "$script" | grep "^#" | grep -v "^#!/" | head -n 3 | sed 's/^# *//' || echo "无描述") - - echo "**描述**: $description" >> "$doc_file" - echo "" >> "$doc_file" - - # 检查是否有使用说明 - if grep -q "Usage:" "$script" || grep -q "用法:" "$script"; then - echo "**用法**: 请查看脚本内部说明" >> "$doc_file" - fi - - echo "" >> "$doc_file" - done - - log_success "脚本文档已生成: $doc_file" -} - -# 生成 API 文档 -generate_api_docs() { - log_info "生成 API 文档..." - - local doc_file="docs/API.md" - - cat > "$doc_file" << 'EOF' -# API 文档 - -## MCP 服务器 API - -### Qdrant MCP 服务器 - -- **端口**: 3000 -- **协议**: HTTP/JSON-RPC -- **功能**: 向量搜索和文档管理 - -### 主要端点 - -- `/search` - 搜索文档 -- `/add` - 添加文档 -- `/delete` - 删除文档 - -更多详细信息请参考各 MCP 服务器的源码。 -EOF - - log_success "API 文档已生成: $doc_file" -} - -# 生成部署文档 -generate_deployment_docs() { - log_info "生成部署文档..." - - local doc_file="docs/DEPLOYMENT.md" - - cat > "$doc_file" << 'EOF' -# 部署文档 - -## 快速开始 - -1. 环境设置 -```bash -make setup -``` - -2. 初始化服务 -```bash -./scripts/setup/init/init-vault-dev.sh -./scripts/deployment/consul/deploy-consul-cluster-kv.sh -``` - -3. 启动 MCP 服务器 -```bash -./scripts/mcp/tools/start-mcp-server.sh -``` - -## 详细部署步骤 - -请参考各组件的具体部署脚本和配置文件。 -EOF - - log_success "部署文档已生成: $doc_file" -} - -# 更新主 README -update_main_readme() { - log_info "更新主 README..." - - # 备份原 README - if [ -f "README.md" ]; then - cp "README.md" "README.md.backup" - fi - - # 在 README 中添加脚本整理信息 - cat >> "README.md" << 'EOF' - -## 脚本整理 - -项目脚本已重新整理,按功能分类存放在 `scripts/` 目录中: - -- `scripts/setup/` - 环境设置和初始化 -- `scripts/deployment/` - 部署相关脚本 -- `scripts/testing/` - 测试脚本 -- `scripts/utilities/` - 工具脚本 -- `scripts/mcp/` - MCP 服务器相关 -- `scripts/ci-cd/` - CI/CD 相关 - -详细信息请查看 [脚本索引](scripts/SCRIPT_INDEX.md)。 - -EOF - - log_success "主 README 已更新" -} - -# 主函数 -main() { - log_info "开始生成文档..." - - generate_script_docs - generate_api_docs - generate_deployment_docs - update_main_readme - - log_success "文档生成完成!" -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/scripts/ci-cd/quality/lint.sh b/scripts/ci-cd/quality/lint.sh deleted file mode 100755 index cb3df35..0000000 --- a/scripts/ci-cd/quality/lint.sh +++ /dev/null @@ -1,231 +0,0 @@ -#!/bin/bash - -# 代码质量检查脚本 -# 检查脚本语法、代码风格等 - -set -euo pipefail - -# 颜色定义 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -BLUE='\033[0;34m' -NC='\033[0m' # No Color - -# 计数器 -TOTAL_FILES=0 -PASSED_FILES=0 -FAILED_FILES=0 - -# 日志函数 -log_info() { - echo -e "${BLUE}[INFO]${NC} $1" -} - -log_success() { - echo -e "${GREEN}[SUCCESS]${NC} $1" -} - -log_warning() { - echo -e "${YELLOW}[WARNING]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 检查 Shell 脚本语法 -check_shell_syntax() { - log_info "检查 Shell 脚本语法..." - - local shell_files - shell_files=$(find scripts/ -name "*.sh" -type f) - - if [ -z "$shell_files" ]; then - log_warning "未找到 Shell 脚本文件" - return 0 - fi - - while IFS= read -r file; do - ((TOTAL_FILES++)) - log_info "检查: $file" - - if bash -n "$file"; then - log_success "✓ $file" - ((PASSED_FILES++)) - else - log_error "✗ $file - 语法错误" - ((FAILED_FILES++)) - fi - done <<< "$shell_files" -} - -# 检查 Python 脚本语法 -check_python_syntax() { - log_info "检查 Python 脚本语法..." - - local python_files - python_files=$(find scripts/ -name "*.py" -type f) - - if [ -z "$python_files" ]; then - log_warning "未找到 Python 脚本文件" - return 0 - fi - - while IFS= read -r file; do - ((TOTAL_FILES++)) - log_info "检查: $file" - - if python3 -m py_compile "$file" 2>/dev/null; then - log_success "✓ $file" - ((PASSED_FILES++)) - else - log_error "✗ $file - 语法错误" - ((FAILED_FILES++)) - fi - done <<< "$python_files" -} - -# 检查脚本权限 -check_script_permissions() { - log_info "检查脚本执行权限..." - - local script_files - script_files=$(find scripts/ -name "*.sh" -type f) - - if [ -z "$script_files" ]; then - log_warning "未找到脚本文件" - return 0 - fi - - local permission_issues=0 - - while IFS= read -r file; do - if [ ! -x "$file" ]; then - log_warning "⚠ $file - 缺少执行权限" - ((permission_issues++)) - fi - done <<< "$script_files" - - if [ "$permission_issues" -eq 0 ]; then - log_success "所有脚本都有执行权限" - else - log_warning "发现 $permission_issues 个权限问题" - log_info "运行以下命令修复权限: find scripts/ -name '*.sh' -exec chmod +x {} \\;" - fi -} - -# 检查脚本头部 -check_script_headers() { - log_info "检查脚本头部..." - - local script_files - script_files=$(find scripts/ -name "*.sh" -type f) - - if [ -z "$script_files" ]; then - log_warning "未找到脚本文件" - return 0 - fi - - local header_issues=0 - - while IFS= read -r file; do - local first_line - first_line=$(head -n 1 "$file") - - if [[ ! "$first_line" =~ ^#!/bin/bash ]] && [[ ! "$first_line" =~ ^#!/usr/bin/env\ bash ]]; then - log_warning "⚠ $file - 缺少或错误的 shebang" - ((header_issues++)) - fi - done <<< "$script_files" - - if [ "$header_issues" -eq 0 ]; then - log_success "所有脚本都有正确的 shebang" - else - log_warning "发现 $header_issues 个 shebang 问题" - fi -} - -# 检查配置文件语法 -check_config_syntax() { - log_info "检查配置文件语法..." - - # 检查 JSON 文件 - local json_files - json_files=$(find . -name "*.json" -type f -not -path "./.git/*") - - if [ -n "$json_files" ]; then - while IFS= read -r file; do - ((TOTAL_FILES++)) - log_info "检查 JSON: $file" - - if jq empty "$file" 2>/dev/null; then - log_success "✓ $file" - ((PASSED_FILES++)) - else - log_error "✗ $file - JSON 语法错误" - ((FAILED_FILES++)) - fi - done <<< "$json_files" - fi - - # 检查 YAML 文件 - local yaml_files - yaml_files=$(find . -name "*.yml" -o -name "*.yaml" -type f -not -path "./.git/*") - - if [ -n "$yaml_files" ] && command -v yamllint &> /dev/null; then - while IFS= read -r file; do - ((TOTAL_FILES++)) - log_info "检查 YAML: $file" - - if yamllint "$file" 2>/dev/null; then - log_success "✓ $file" - ((PASSED_FILES++)) - else - log_error "✗ $file - YAML 语法错误" - ((FAILED_FILES++)) - fi - done <<< "$yaml_files" - elif [ -n "$yaml_files" ]; then - log_warning "yamllint 未安装,跳过 YAML 检查" - fi -} - -# 生成报告 -generate_report() { - log_info "生成检查报告..." - - echo - echo "==================================" - echo " 代码质量检查报告" - echo "==================================" - echo "总文件数: $TOTAL_FILES" - echo "通过: $PASSED_FILES" - echo "失败: $FAILED_FILES" - echo "成功率: $(( PASSED_FILES * 100 / (TOTAL_FILES == 0 ? 1 : TOTAL_FILES) ))%" - echo "==================================" - - if [ "$FAILED_FILES" -eq 0 ]; then - log_success "所有检查都通过了!" - return 0 - else - log_error "发现 $FAILED_FILES 个问题,请修复后重新运行" - return 1 - fi -} - -# 主函数 -main() { - log_info "开始代码质量检查..." - - check_shell_syntax - check_python_syntax - check_script_permissions - check_script_headers - check_config_syntax - - generate_report -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/scripts/ci-cd/quality/security-scan.sh b/scripts/ci-cd/quality/security-scan.sh deleted file mode 100755 index 6367d9b..0000000 --- a/scripts/ci-cd/quality/security-scan.sh +++ /dev/null @@ -1,142 +0,0 @@ -#!/bin/bash - -# 安全扫描脚本 -# 扫描代码中的安全问题和敏感信息 - -set -euo pipefail - -# 颜色定义 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -BLUE='\033[0;34m' -NC='\033[0m' # No Color - -# 计数器 -TOTAL_ISSUES=0 -HIGH_ISSUES=0 -MEDIUM_ISSUES=0 -LOW_ISSUES=0 - -# 日志函数 -log_info() { - echo -e "${BLUE}[INFO]${NC} $1" -} - -log_success() { - echo -e "${GREEN}[SUCCESS]${NC} $1" -} - -log_warning() { - echo -e "${YELLOW}[WARNING]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 检查敏感信息泄露 -check_secrets() { - log_info "检查敏感信息泄露..." - - local patterns=( - "password\s*=\s*['\"][^'\"]*['\"]" - "token\s*=\s*['\"][^'\"]*['\"]" - "api_key\s*=\s*['\"][^'\"]*['\"]" - "secret\s*=\s*['\"][^'\"]*['\"]" - "private_key" - "-----BEGIN.*PRIVATE KEY-----" - ) - - local found_secrets=0 - - for pattern in "${patterns[@]}"; do - local matches - matches=$(grep -r -i -E "$pattern" . --exclude-dir=.git --exclude-dir=backups 2>/dev/null || true) - - if [ -n "$matches" ]; then - log_error "发现可能的敏感信息:" - echo "$matches" - ((found_secrets++)) - ((HIGH_ISSUES++)) - fi - done - - if [ "$found_secrets" -eq 0 ]; then - log_success "未发现明显的敏感信息泄露" - else - log_error "发现 $found_secrets 种类型的敏感信息,请检查并移除" - fi - - ((TOTAL_ISSUES += found_secrets)) -} - -# 检查不安全的命令使用 -check_unsafe_commands() { - log_info "检查不安全的命令使用..." - - local unsafe_patterns=( - "rm\s+-rf\s+/" - "chmod\s+777" - "curl.*-k" - "wget.*--no-check-certificate" - ) - - local unsafe_found=0 - - for pattern in "${unsafe_patterns[@]}"; do - local matches - matches=$(grep -r -E "$pattern" scripts/ 2>/dev/null || true) - - if [ -n "$matches" ]; then - log_warning "发现可能不安全的命令使用:" - echo "$matches" - ((unsafe_found++)) - ((MEDIUM_ISSUES++)) - fi - done - - if [ "$unsafe_found" -eq 0 ]; then - log_success "未发现明显不安全的命令使用" - else - log_warning "发现 $unsafe_found 个可能不安全的命令,请检查" - fi - - ((TOTAL_ISSUES += unsafe_found)) -} - -# 生成报告 -generate_report() { - log_info "生成安全扫描报告..." - - echo - echo "==================================" - echo " 安全扫描报告" - echo "==================================" - echo "总问题数: $TOTAL_ISSUES" - echo "高危: $HIGH_ISSUES" - echo "中危: $MEDIUM_ISSUES" - echo "低危: $LOW_ISSUES" - echo "==================================" - - if [ "$TOTAL_ISSUES" -eq 0 ]; then - log_success "安全扫描通过,未发现问题!" - return 0 - else - log_warning "发现 $TOTAL_ISSUES 个安全问题,请检查并修复" - return 1 - fi -} - -# 主函数 -main() { - log_info "开始安全扫描..." - - check_secrets - check_unsafe_commands - - generate_report -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/scripts/deploy-consul-to-nomad-servers.sh b/scripts/deploy-consul-to-nomad-servers.sh new file mode 100755 index 0000000..48fbee9 --- /dev/null +++ b/scripts/deploy-consul-to-nomad-servers.sh @@ -0,0 +1,58 @@ +#!/bin/bash + +# 为所有 Nomad Server 部署 Consul Client + +echo "🚀 部署 Consul Client 到所有 Nomad Server 节点" +echo "================================================" + +# 部署 Consul Client +echo "1. 部署 Consul Client..." +ansible-playbook -i ansible/inventory/hosts.yml \ + ansible/consul-client-deployment.yml \ + --limit nomad_servers + +if [ $? -eq 0 ]; then + echo "✅ Consul Client 部署成功" +else + echo "❌ Consul Client 部署失败" + exit 1 +fi + +# 更新 Nomad 配置 +echo "" +echo "2. 更新 Nomad Server 配置..." +echo "需要手动更新每个 Nomad Server 的配置:" +echo "" +echo "修改 /etc/nomad.d/nomad.hcl 中的 consul 块:" +echo "consul {" +echo " address = \"127.0.0.1:8500\" # 改为本地" +echo " server_service_name = \"nomad\"" +echo " client_service_name = \"nomad-client\"" +echo " auto_advertise = true" +echo " server_auto_join = true" +echo " client_auto_join = false" +echo "}" +echo "" +echo "然后重启 Nomad 服务:" +echo "systemctl restart nomad" + +echo "" +echo "3. 验证部署..." +sleep 5 + +# 验证 Consul Client +for server in semaphore ch3 ash1d ash2e ch2 de onecloud1; do + echo "检查 $server..." + if curl -s http://$server.tailnet-68f9.ts.net:8500/v1/status/leader > /dev/null 2>&1; then + echo "✅ $server - Consul Client 运行正常" + else + echo "❌ $server - Consul Client 无响应" + fi +done + +echo "" +echo "🎉 部署完成!" +echo "下一步:" +echo "1. 手动更新每个 Nomad Server 的配置文件" +echo "2. 重启 Nomad 服务" +echo "3. 验证 Nomad 与 Consul 的集成" diff --git a/scripts/deployment/consul/consul-variables-example.sh b/scripts/deployment/consul/consul-variables-example.sh deleted file mode 100755 index 0c47501..0000000 --- a/scripts/deployment/consul/consul-variables-example.sh +++ /dev/null @@ -1,217 +0,0 @@ -#!/bin/bash - -# Consul 变量和存储配置示例脚本 -# 此脚本展示了如何配置Consul的变量和存储功能 - -set -e - -# 配置参数 -CONSUL_ADDR=${CONSUL_ADDR:-"http://localhost:8500"} -ENVIRONMENT=${ENVIRONMENT:-"dev"} -PROVIDER=${PROVIDER:-"oracle"} -REGION=${REGION:-"kr"} - -echo "Consul 变量和存储配置示例" -echo "=========================" -echo "Consul 地址: $CONSUL_ADDR" -echo "环境: $ENVIRONMENT" -echo "提供商: $PROVIDER" -echo "区域: $REGION" -echo "" - -# 检查Consul连接 -check_consul_connection() { - echo "检查Consul连接..." - if curl -s "$CONSUL_ADDR/v1/status/leader" > /dev/null; then - echo "✓ Consul连接正常" - else - echo "✗ 无法连接到Consul,请检查Consul服务是否运行" - exit 1 - fi -} - -# 配置应用变量 -configure_app_variables() { - echo "配置应用变量..." - - # 应用基本信息 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/name" -d "my-application" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/version" -d "1.0.0" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/environment" -d "$ENVIRONMENT" - - # 特性开关 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/features/new_ui" -d "true" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/features/advanced_analytics" -d "false" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/features/beta_features" -d "true" - - echo "✓ 应用变量配置完成" -} - -# 配置数据库变量 -configure_database_variables() { - echo "配置数据库变量..." - - # 数据库连接信息 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/host" -d "db.example.com" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/port" -d "5432" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/name" -d "myapp_db" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/ssl_mode" -d "require" - - # 数据库连接池配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/max_connections" -d "100" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/min_connections" -d "10" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/connection_timeout" -d "30s" - - echo "✓ 数据库变量配置完成" -} - -# 配置缓存变量 -configure_cache_variables() { - echo "配置缓存变量..." - - # Redis配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/host" -d "redis.example.com" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/port" -d "6379" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/password" -d "secure_password" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/db" -d "0" - - # 缓存策略 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/ttl" -d "3600" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/max_memory" -d "2gb" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/eviction_policy" -d "allkeys-lru" - - echo "✓ 缓存变量配置完成" -} - -# 配置消息队列变量 -configure_messaging_variables() { - echo "配置消息队列变量..." - - # RabbitMQ配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/host" -d "rabbitmq.example.com" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/port" -d "5672" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/username" -d "myapp" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/password" -d "secure_password" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/vhost" -d "/myapp" - - # 队列配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/queue_name" -d "tasks" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/exchange" -d "myapp_exchange" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/routing_key" -d "task.#" - - echo "✓ 消息队列变量配置完成" -} - -# 配置云服务提供商变量 -configure_provider_variables() { - echo "配置云服务提供商变量..." - - if [ "$PROVIDER" = "oracle" ]; then - # Oracle Cloud配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/tenancy_ocid" -d "ocid1.tenancy.oc1..aaaaaaaayourtenancyocid" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/user_ocid" -d "ocid1.user.oc1..aaaaaaaayouruserocid" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/fingerprint" -d "your-fingerprint" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/compartment_id" -d "ocid1.compartment.oc1..aaaaaaaayourcompartmentid" - elif [ "$PROVIDER" = "aws" ]; then - # AWS配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/access_key" -d "your-access-key" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/secret_key" -d "your-secret-key" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION" - elif [ "$PROVIDER" = "gcp" ]; then - # GCP配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/project_id" -d "your-project-id" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/credentials_path" -d "/path/to/service-account.json" - elif [ "$PROVIDER" = "digitalocean" ]; then - # DigitalOcean配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/token" -d "your-do-token" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/$region/region" -d "$REGION" - fi - - echo "✓ 云服务提供商变量配置完成" -} - -# 配置存储相关变量 -configure_storage_variables() { - echo "配置存储相关变量..." - - # 快照配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/enabled" -d "true" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/interval" -d "24h" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/retain" -d "30" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/snapshot/name" -d "consul-snapshot-{{.Timestamp}}" - - # 备份配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/enabled" -d "true" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/interval" -d "6h" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/retain" -d "7" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/backup/name" -d "consul-backup-{{.Timestamp}}" - - # 数据目录配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/data_dir" -d "/opt/consul/data" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/raft_dir" -d "/opt/consul/raft" - - # Autopilot配置 - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/cleanup_dead_servers" -d "true" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/last_contact_threshold" -d "200ms" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/max_trailing_logs" -d "250" - curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/autopilot/server_stabilization_time" -d "10s" - - echo "✓ 存储相关变量配置完成" -} - -# 显示配置结果 -display_configuration() { - echo "" - echo "配置结果:" - echo "=========" - - echo "应用配置:" - curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/app/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)" - - echo "" - echo "数据库配置:" - curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/database/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)" - - echo "" - echo "缓存配置:" - curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/cache/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)" - - echo "" - echo "消息队列配置:" - curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/messaging/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)" - - echo "" - echo "云服务提供商配置:" - curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/$PROVIDER/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)" - - echo "" - echo "存储配置:" - curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/storage/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' 2>/dev/null || echo " (需要安装jq以查看格式化输出)" -} - -# 主函数 -main() { - check_consul_connection - configure_app_variables - configure_database_variables - configure_cache_variables - configure_messaging_variables - configure_provider_variables - configure_storage_variables - display_configuration - - echo "" - echo "✓ 所有变量和存储配置已完成!" - echo "" - echo "使用说明:" - echo "1. 在Terraform中使用consul_keys数据源获取这些配置" - echo "2. 在应用程序中使用Consul客户端库读取这些配置" - echo "3. 使用Consul UI查看和管理这些配置" - echo "" - echo "配置文件位置: /root/mgmt/docs/setup/consul_variables_and_storage_guide.md" -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/scripts/deployment/consul/deploy-consul-cluster-kv.sh b/scripts/deployment/consul/deploy-consul-cluster-kv.sh deleted file mode 100755 index 793371f..0000000 --- a/scripts/deployment/consul/deploy-consul-cluster-kv.sh +++ /dev/null @@ -1,117 +0,0 @@ -#!/bin/bash - -# Consul集群部署脚本 - 遵循最佳变量命名规范 -# 此脚本将部署一个完全遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式的Consul集群 - -set -e - -# 配置参数 -CONSUL_ADDR="${CONSUL_ADDR:-localhost:8500}" -ENVIRONMENT="${ENVIRONMENT:-dev}" -NOMAD_ADDR="${NOMAD_ADDR:-localhost:4646}" -CONSUL_CONFIG_DIR="${CONSUL_CONFIG_DIR:-/root/mgmt/components/consul/configs}" -CONSUL_JOBS_DIR="${CONSUL_JOBS_DIR:-/root/mgmt/components/consul/jobs}" - -echo "开始部署遵循最佳变量命名规范的Consul集群..." -echo "Consul地址: $CONSUL_ADDR" -echo "Nomad地址: $NOMAD_ADDR" -echo "环境: $ENVIRONMENT" - -# 检查Consul连接 -echo "检查Consul连接..." -if ! curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then - echo "错误: 无法连接到Consul服务器 $CONSUL_ADDR" - exit 1 -fi -echo "Consul连接成功" - -# 检查Nomad连接 -echo "检查Nomad连接..." -if ! curl -s "$NOMAD_ADDR/v1/status/leader" | grep -q "."; then - echo "错误: 无法连接到Nomad服务器 $NOMAD_ADDR" - exit 1 -fi -echo "Nomad连接成功" - -# 步骤1: 设置Consul变量 -echo "步骤1: 设置Consul变量..." -/root/mgmt/deployment/scripts/setup_consul_cluster_variables.sh - -# 步骤2: 生成Consul配置文件 -echo "步骤2: 生成Consul配置文件..." -/root/mgmt/deployment/scripts/generate_consul_config.sh - -# 步骤3: 停止现有的Consul集群 -echo "步骤3: 停止现有的Consul集群..." -if nomad job status consul-cluster-simple 2>/dev/null; then - nomad job stop consul-cluster-simple - echo "已停止现有的consul-cluster-simple作业" -fi - -if nomad job status consul-cluster-dynamic 2>/dev/null; then - nomad job stop consul-cluster-dynamic - echo "已停止现有的consul-cluster-dynamic作业" -fi - -if nomad job status consul-cluster-kv 2>/dev/null; then - nomad job stop consul-cluster-kv - echo "已停止现有的consul-cluster-kv作业" -fi - -# 步骤4: 部署新的Consul集群 -echo "步骤4: 部署新的Consul集群..." -nomad job run $CONSUL_JOBS_DIR/consul-cluster-kv.nomad - -# 步骤5: 验证部署 -echo "步骤5: 验证部署..." -sleep 10 - -# 检查作业状态 -if nomad job status consul-cluster-kv | grep -q "running"; then - echo "Consul集群作业正在运行" -else - echo "错误: Consul集群作业未运行" - exit 1 -fi - -# 检查Consul集群状态 -if curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then - echo "Consul集群leader已选举" -else - echo "错误: Consul集群leader未选举" - exit 1 -fi - -# 检查节点数量 -NODE_COUNT=$(curl -s "$CONSUL_ADDR/v1/status/peers" | jq '. | length') -if [ "$NODE_COUNT" -eq 3 ]; then - echo "Consul集群节点数量正确: $NODE_COUNT" -else - echo "警告: Consul集群节点数量不正确: $NODE_COUNT (期望: 3)" -fi - -# 步骤6: 验证变量配置 -echo "步骤6: 验证变量配置..." - -# 检查一些关键变量 -if curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/datacenter" | jq -r '.[].Value' | base64 -d | grep -q "dc1"; then - echo "Consul数据中心配置正确" -else - echo "警告: Consul数据中心配置可能不正确" -fi - -if curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/master/ip" | jq -r '.[].Value' | base64 -d | grep -q "100.117.106.136"; then - echo "Consul master节点IP配置正确" -else - echo "警告: Consul master节点IP配置可能不正确" -fi - -# 步骤7: 显示访问信息 -echo "步骤7: 显示访问信息..." -echo "Consul UI地址: http://100.117.106.136:8500" -echo "Consul API地址: http://100.117.106.136:8500/v1" -echo "Nomad UI地址: http://100.117.106.136:4646" -echo "Nomad API地址: http://100.117.106.136:4646/v1" - -echo "Consul集群部署完成!" -echo "集群现在完全遵循最佳变量命名规范: config/{environment}/{provider}/{region_or_service}/{key}" \ No newline at end of file diff --git a/scripts/deployment/vault/deploy-vault.sh b/scripts/deployment/vault/deploy-vault.sh deleted file mode 100755 index 5f58ac3..0000000 --- a/scripts/deployment/vault/deploy-vault.sh +++ /dev/null @@ -1,143 +0,0 @@ -#!/bin/bash -# 部署Vault集群的脚本 - -# 检查并安装Vault -if ! which vault >/dev/null; then - echo "==== 安装Vault ====" - VAULT_VERSION="1.20.4" - wget -q https://releases.hashicorp.com/vault/${VAULT_VERSION}/vault_${VAULT_VERSION}_linux_amd64.zip - unzip -q vault_${VAULT_VERSION}_linux_amd64.zip - sudo mv vault /usr/local/bin/ - rm vault_${VAULT_VERSION}_linux_amd64.zip -fi - -export PATH=$PATH:/usr/local/bin - -set -e - -echo "===== 开始部署Vault集群 =====" - -# 目录定义 -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -ROOT_DIR="$(dirname "$SCRIPT_DIR")" -ANSIBLE_DIR="$ROOT_DIR/playbooks" -JOBS_DIR="$ROOT_DIR/components/vault/jobs" - -# 颜色定义 -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -RED='\033[0;31m' -NC='\033[0m' # No Color - -# 函数定义 -log_info() { - echo -e "${GREEN}[INFO]${NC} $1" -} - -log_warn() { - echo -e "${YELLOW}[WARN]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 检查命令是否存在 -check_command() { - if ! command -v $1 &> /dev/null; then - log_error "$1 命令未找到,请先安装" - exit 1 - fi -} - -# 检查必要的命令 -check_command ansible-playbook -check_command nomad -check_command vault - -# 步骤1: 使用Ansible安装Vault -log_info "步骤1: 使用Ansible安装Vault..." -ansible-playbook -i "$ANSIBLE_DIR/inventories/production/vault.ini" "$ANSIBLE_DIR/playbooks/install/install_vault.yml" - -# 步骤2: 部署Vault Nomad作业 -log_info "步骤2: 部署Vault Nomad作业..." -nomad job run "$JOBS_DIR/vault-cluster-exec.nomad" - -# 等待Nomad作业部署完成 -log_info "等待Nomad作业部署完成..." -sleep 10 - -# 检查Nomad作业状态 -nomad_status=$(nomad job status vault-cluster-exec | grep Status | head -1 | awk '{print $2}') -if [ "$nomad_status" != "running" ]; then - log_warn "Vault Nomad作业状态不是'running',当前状态: $nomad_status" - log_info "请检查Nomad作业状态: nomad job status vault-cluster-exec" -fi - -# 步骤3: 检查Vault状态并初始化(如果需要) -log_info "步骤3: 检查Vault状态..." -export VAULT_ADDR='http://127.0.0.1:8200' - -# 等待Vault启动 -log_info "等待Vault启动..." -for i in {1..30}; do - if curl -s "$VAULT_ADDR/v1/sys/health" > /dev/null; then - break - fi - echo -n "." - sleep 2 -done -echo "" - -# 检查Vault是否已初始化 -init_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"initialized":[^,}]*' | cut -d ':' -f2) -if [ "$init_status" = "false" ]; then - log_info "Vault未初始化,正在初始化..." - - # 初始化Vault并保存密钥 - mkdir -p "$ROOT_DIR/security/secrets/vault" - vault operator init -key-shares=5 -key-threshold=3 -format=json > "$ROOT_DIR/security/secrets/vault/init_keys.json" - - if [ $? -eq 0 ]; then - log_info "Vault初始化成功,解封密钥和根令牌已保存到 $ROOT_DIR/security/secrets/vault/init_keys.json" - log_warn "请确保安全保存这些密钥!" - - # 提取解封密钥 - unseal_key1=$(cat "$ROOT_DIR/security/secrets/vault/init_keys.json" | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -1) - unseal_key2=$(cat "$ROOT_DIR/security/secrets/vault/init_keys.json" | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -2 | tail -1) - unseal_key3=$(cat "$ROOT_DIR/security/secrets/vault/init_keys.json" | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -3 | tail -1) - - # 解封Vault - log_info "正在解封Vault..." - vault operator unseal "$unseal_key1" - vault operator unseal "$unseal_key2" - vault operator unseal "$unseal_key3" - - log_info "Vault已成功解封" - else - log_error "Vault初始化失败" - exit 1 - fi -else - log_info "Vault已初始化" - - # 检查Vault是否已解封 - sealed_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"sealed":[^,}]*' | cut -d ':' -f2) - if [ "$sealed_status" = "true" ]; then - log_warn "Vault已初始化但仍处于密封状态,请手动解封" - log_info "使用以下命令解封Vault:" - log_info "export VAULT_ADDR='http://127.0.0.1:8200'" - log_info "vault operator unseal <解封密钥1>" - log_info "vault operator unseal <解封密钥2>" - log_info "vault operator unseal <解封密钥3>" - else - log_info "Vault已初始化且已解封,可以正常使用" - fi -fi - -# 显示Vault状态 -log_info "Vault状态:" -vault status - -log_info "===== Vault集群部署完成 =====" -log_info "请在其他节点上运行解封操作,确保集群完全可用" \ No newline at end of file diff --git a/scripts/deployment/vault/vault-dev-example.sh b/scripts/deployment/vault/vault-dev-example.sh deleted file mode 100755 index a2da0a8..0000000 --- a/scripts/deployment/vault/vault-dev-example.sh +++ /dev/null @@ -1,50 +0,0 @@ -#!/bin/bash -# Vault开发环境使用示例 - -echo "===== Vault开发环境使用示例 =====" - -# 设置环境变量 -source /root/mgmt/security/secrets/vault/dev/vault_env.sh - -echo "1. 检查Vault状态" -vault status - -echo "" -echo "2. 写入示例密钥值" -vault kv put secret/myapp/config username="devuser" password="devpassword" database="devdb" - -echo "" -echo "3. 读取示例密钥值" -vault kv get secret/myapp/config - -echo "" -echo "4. 列出密钥路径" -vault kv list secret/myapp/ - -echo "" -echo "5. 创建示例策略" -cat > /tmp/dev-policy.hcl << EOF -# 开发环境示例策略 -path "secret/*" { - capabilities = ["create", "read", "update", "delete", "list"] -} - -path "sys/mounts" { - capabilities = ["read"] -} -EOF - -vault policy write dev-policy /tmp/dev-policy.hcl - -echo "" -echo "6. 创建有限权限令牌" -vault token create -policy=dev-policy - -echo "" -echo "7. 启用并配置其他密钥引擎示例" -echo "启用数据库密钥引擎:" -echo "vault secrets enable database" - -echo "" -echo "===== Vault开发环境示例完成 =====" -echo "注意:这些命令仅用于开发测试,请勿在生产环境中使用相同配置" \ No newline at end of file diff --git a/scripts/deployment/vault/vault-dev-quickstart.sh b/scripts/deployment/vault/vault-dev-quickstart.sh deleted file mode 100755 index f95421b..0000000 --- a/scripts/deployment/vault/vault-dev-quickstart.sh +++ /dev/null @@ -1,56 +0,0 @@ -#!/bin/bash -# Vault开发环境快速开始指南 - -echo "===== Vault开发环境快速开始 =====" - -# 1. 设置环境变量 -echo "1. 设置环境变量" -source /root/mgmt/security/secrets/vault/dev/vault_env.sh -echo "VAULT_ADDR: $VAULT_ADDR" -echo "VAULT_TOKEN: $VAULT_TOKEN" - -# 2. 检查Vault状态 -echo "" -echo "2. 检查Vault状态" -vault status - -# 3. 存储密钥值 -echo "" -echo "3. 存储密钥值" -vault kv put secret/example/api_key value="my_secret_api_key_12345" - -# 4. 读取密钥值 -echo "" -echo "4. 读取密钥值" -vault kv get secret/example/api_key - -# 5. 列出密钥路径 -echo "" -echo "5. 列出密钥路径" -vault kv list secret/example/ - -# 6. 创建策略示例 -echo "" -echo "6. 创建示例策略" -cat > /tmp/example-policy.hcl << EOF -# 示例策略 - 允许读取secret/example路径下的密钥 -path "secret/example/*" { - capabilities = ["read", "list"] -} - -# 允许列出密钥引擎 -path "sys/mounts" { - capabilities = ["read"] -} -EOF - -vault policy write example-policy /tmp/example-policy.hcl - -# 7. 创建有限权限令牌 -echo "" -echo "7. 创建有限权限令牌" -vault token create -policy=example-policy - -echo "" -echo "===== Vault开发环境快速开始完成 =====" -echo "您现在可以开始在开发环境中使用Vault了!" \ No newline at end of file diff --git a/scripts/diagnose-consul-sync.sh b/scripts/diagnose-consul-sync.sh new file mode 100755 index 0000000..aeddc0f --- /dev/null +++ b/scripts/diagnose-consul-sync.sh @@ -0,0 +1,62 @@ +#!/bin/bash + +# Consul 集群同步诊断脚本 + +echo "=== Consul 集群同步诊断 ===" +echo "时间: $(date)" +echo "" + +CONSUL_NODES=( + "master.tailnet-68f9.ts.net:8500" + "warden.tailnet-68f9.ts.net:8500" + "ash3c.tailnet-68f9.ts.net:8500" +) + +echo "1. 检查集群状态" +echo "==================" +for node in "${CONSUL_NODES[@]}"; do + echo "节点: $node" + echo " Leader: $(curl -s http://$node/v1/status/leader 2>/dev/null || echo 'ERROR')" + echo " Peers: $(curl -s http://$node/v1/status/peers 2>/dev/null | jq length 2>/dev/null || echo 'ERROR')" + echo "" +done + +echo "2. 检查服务注册" +echo "================" +for node in "${CONSUL_NODES[@]}"; do + echo "节点: $node" + echo " Catalog 服务:" + curl -s http://$node/v1/catalog/services 2>/dev/null | jq -r 'keys[]' 2>/dev/null | grep -E "(consul-lb|traefik)" | sed 's/^/ /' || echo " ERROR 或无服务" + + echo " Agent 服务:" + curl -s http://$node/v1/agent/services 2>/dev/null | jq -r 'keys[]' 2>/dev/null | grep -E "traefik" | sed 's/^/ /' || echo " 无本地服务" + echo "" +done + +echo "3. 检查健康状态" +echo "================" +for node in "${CONSUL_NODES[@]}"; do + echo "节点: $node" + checks=$(curl -s http://$node/v1/agent/checks 2>/dev/null) + if [ $? -eq 0 ]; then + echo "$checks" | jq -r 'to_entries[] | select(.key | contains("traefik")) | " \(.key): \(.value.Status)"' 2>/dev/null || echo " 无 Traefik 健康检查" + else + echo " ERROR: 无法连接" + fi + echo "" +done + +echo "4. 网络连通性测试" +echo "==================" +echo "测试从当前节点到 Traefik 的连接:" +curl -s -w " HTTP %{http_code} - 响应时间: %{time_total}s\n" -o /dev/null http://100.97.62.111:80/ || echo " ERROR: 无法连接到 Traefik" +curl -s -w " HTTP %{http_code} - 响应时间: %{time_total}s\n" -o /dev/null http://100.97.62.111:8080/api/overview || echo " ERROR: 无法连接到 Traefik Dashboard" + +echo "" +echo "5. 建议操作" +echo "===========" +echo "如果发现问题:" +echo " 1. 重新注册服务: ./scripts/register-traefik-to-all-consul.sh" +echo " 2. 检查 Consul 日志: nomad alloc logs \$(nomad job allocs consul-cluster-nomad | grep warden | awk '{print \$1}') consul" +echo " 3. 重启有问题的 Consul 节点" +echo " 4. 检查网络连通性和防火墙设置" diff --git a/scripts/mcp/configs/sync-all-configs.sh b/scripts/mcp/configs/sync-all-configs.sh deleted file mode 100755 index 2faafec..0000000 --- a/scripts/mcp/configs/sync-all-configs.sh +++ /dev/null @@ -1,87 +0,0 @@ -#!/bin/bash - -# 链接所有MCP配置文件的脚本 -# 该脚本将所有IDE和AI助手的MCP配置链接到NFS共享的配置文件 - -NFS_CONFIG="/mnt/fnsync/mcp/mcp_shared_config.json" - -echo "链接所有MCP配置文件到NFS共享配置..." - -# 检查NFS配置文件是否存在 -if [ ! -f "$NFS_CONFIG" ]; then - echo "错误: NFS配置文件不存在: $NFS_CONFIG" - exit 1 -fi - -echo "✓ 使用NFS共享配置作为基准: $NFS_CONFIG" - -# 定义所有可能的MCP配置位置 -CONFIGS=( - # Kilo Code IDE (全局配置,移除了项目级别配置以避免冲突) - "../.trae-server/data/User/globalStorage/kilocode.kilo-code/settings/mcp_settings.json" - - # Tencent CodeBuddy - "$HOME/.codebuddy-server/data/User/globalStorage/tencent.planning-genie/settings/codebuddy_mcp_settings.json" - "$HOME/.codebuddy/data/User/globalStorage/tencent.planning-genie/settings/codebuddy_mcp_settings.json" - # 新增的CodeBuddy-CN - "$HOME/.codebuddy-server-cn/data/User/globalStorage/tencent.planning-genie/settings/codebuddy_mcp_settings.json" - - # Claude相关 - "$HOME/.claude.json" - "$HOME/.claude.json.backup" - "$HOME/.config/claude/settings/mcp_settings.json" - - # Cursor - "$HOME/.cursor-server/data/User/globalStorage/xxx.cursor/settings/mcp_settings.json" - - # Qoder - "$HOME/.qoder-server/data/User/globalStorage/xxx.qoder/settings/mcp_settings.json" - - # Cline - "$HOME/.codebuddy-server/data/User/globalStorage/rooveterinaryinc.roo-cline/settings/mcp_settings.json" - "$HOME/Cline/settings/mcp_settings.json" - - # Kiro - "$HOME/.kiro-server/data/User/globalStorage/xxx.kiro/settings/mcp_settings.json" - - # Qwen - "$HOME/.qwen/settings/mcp_settings.json" - - # VSCodium - "$HOME/.vscodium-server/data/User/globalStorage/xxx.vscodium/settings/mcp_settings.json" - - # Other potential locations - ".kilocode/mcp.json" - "$HOME/.config/Qoder/SharedClientCache/mcp.json" - "$HOME/.trae-server/data/Machine/mcp.json" - "$HOME/.trae-cn-server/data/Machine/mcp.json" - "$HOME/.codegeex/agent/configs/user_mcp_config.json" - "$HOME/.codegeex/agent/configs/mcp_config.json" -) - -# 链接到每个配置位置 -for config_path in "${CONFIGS[@]}"; do - if [ -n "$config_path" ]; then - config_dir=$(dirname "$config_path") - if [ -d "$config_dir" ]; then - # 如果目标文件已存在,先备份 - if [ -f "$config_path" ]; then - mv "$config_path" "${config_path}.backup" - echo "✓ 原配置文件已备份: ${config_path}.backup" - fi - - # 创建符号链接 - ln -s "$NFS_CONFIG" "$config_path" 2>/dev/null - if [ $? -eq 0 ]; then - echo "✓ 已创建链接到: $config_path" - else - echo "✗ 创建链接失败: $config_path" - fi - else - echo "✗ 目录不存在: $config_dir" - fi - fi -done - -echo "所有MCP配置链接完成!" -echo "所有IDE和AI助手现在都使用NFS共享的MCP配置文件: $NFS_CONFIG" \ No newline at end of file diff --git a/scripts/mcp/servers/qdrant-mcp-server.py b/scripts/mcp/servers/qdrant-mcp-server.py deleted file mode 100755 index 3a2644a..0000000 --- a/scripts/mcp/servers/qdrant-mcp-server.py +++ /dev/null @@ -1,380 +0,0 @@ -#!/usr/bin/env python3 -""" -Qdrant MCP 服务器 -此脚本实现了一个 MCP 服务器,与 Qdrant 向量数据库集成 -""" - -import asyncio -import json -import os -import sys -from typing import Any, Dict, List, Optional -import logging - -from qdrant_client import QdrantClient -from qdrant_client.models import Distance, VectorParams, PointStruct, Filter - -# 设置日志 -logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(__name__) - -class QdrantMCPServer: - def __init__(self): - # 从环境变量获取配置 - self.qdrant_url = os.getenv("QDRANT_URL", "http://localhost:6333") - self.qdrant_api_key = os.getenv("QDRANT_API_KEY", "") - self.collection_name = os.getenv("COLLECTION_NAME", "mcp") - self.embedding_model = os.getenv("EMBEDDING_MODEL", "bge-m3") - - # 初始化 Qdrant 客户端 - self.client = QdrantClient( - url=self.qdrant_url, - api_key=self.qdrant_api_key if self.qdrant_api_key else None - ) - - # 确保集合存在 - self._ensure_collection_exists() - - logger.info(f"Qdrant MCP 服务器已初始化") - logger.info(f"Qdrant URL: {self.qdrant_url}") - logger.info(f"集合名称: {self.collection_name}") - logger.info(f"嵌入模型: {self.embedding_model}") - - def _ensure_collection_exists(self): - """确保集合存在,如果不存在则创建""" - try: - collections = self.client.get_collections().collections - collection_names = [collection.name for collection in collections] - - if self.collection_name not in collection_names: - # 创建新集合 - self.client.create_collection( - collection_name=self.collection_name, - vectors_config=VectorParams(size=1024, distance=Distance.COSINE) - ) - logger.info(f"已创建新集合: {self.collection_name}") - else: - logger.info(f"集合已存在: {self.collection_name}") - except Exception as e: - logger.error(f"确保集合存在时出错: {e}") - raise - - async def handle_request(self, request: Dict[str, Any]) -> Dict[str, Any]: - """处理 MCP 请求""" - method = request.get("method") - params = request.get("params", {}) - request_id = request.get("id") - - logger.info(f"收到请求: {method}") - - try: - if method == "initialize": - result = await self.initialize(params) - elif method == "tools/list": - result = await self.list_tools(params) - elif method == "tools/call": - result = await self.call_tool(params) - elif method == "resources/list": - result = await self.list_resources(params) - elif method == "resources/read": - result = await self.read_resource(params) - else: - result = { - "error": { - "code": -32601, - "message": f"未知方法: {method}" - } - } - except Exception as e: - logger.error(f"处理请求时出错: {e}") - result = { - "error": { - "code": -32603, - "message": f"内部错误: {str(e)}" - } - } - - response = { - "jsonrpc": "2.0", - "id": request_id, - **result - } - - return response - - async def initialize(self, params: Dict[str, Any]) -> Dict[str, Any]: - """初始化 MCP 服务器""" - logger.info("初始化 Qdrant MCP 服务器") - - return { - "result": { - "protocolVersion": "2024-11-05", - "capabilities": { - "tools": { - "listChanged": False - }, - "resources": { - "subscribe": False, - "listChanged": False - } - }, - "serverInfo": { - "name": "qdrant-mcp-server", - "version": "1.0.0" - } - } - } - - async def list_tools(self, params: Dict[str, Any]) -> Dict[str, Any]: - """列出可用工具""" - return { - "result": { - "tools": [ - { - "name": "qdrant_search", - "description": "在 Qdrant 中搜索相似向量", - "inputSchema": { - "type": "object", - "properties": { - "query": { - "type": "string", - "description": "搜索查询文本" - }, - "limit": { - "type": "integer", - "default": 5, - "description": "返回结果数量限制" - } - }, - "required": ["query"] - } - }, - { - "name": "qdrant_add", - "description": "向 Qdrant 添加向量", - "inputSchema": { - "type": "object", - "properties": { - "text": { - "type": "string", - "description": "要添加的文本内容" - }, - "metadata": { - "type": "object", - "description": "与文本关联的元数据" - } - }, - "required": ["text"] - } - }, - { - "name": "qdrant_delete", - "description": "从 Qdrant 删除向量", - "inputSchema": { - "type": "object", - "properties": { - "id": { - "type": "string", - "description": "要删除的向量ID" - } - }, - "required": ["id"] - } - } - ] - } - } - - async def call_tool(self, params: Dict[str, Any]) -> Dict[str, Any]: - """调用工具""" - name = params.get("name") - arguments = params.get("arguments", {}) - - if name == "qdrant_search": - return await self._search_vectors(arguments) - elif name == "qdrant_add": - return await self._add_vector(arguments) - elif name == "qdrant_delete": - return await self._delete_vector(arguments) - else: - return { - "error": { - "code": -32601, - "message": f"未知工具: {name}" - } - } - - async def _search_vectors(self, params: Dict[str, Any]) -> Dict[str, Any]: - """搜索相似向量""" - query = params.get("query", "") - limit = params.get("limit", 5) - - # 这里应该使用嵌入模型将查询转换为向量 - # 由于我们没有实际的嵌入模型,这里使用一个简单的模拟 - query_vector = [0.1] * 1024 # 模拟向量 - - try: - search_result = self.client.search( - collection_name=self.collection_name, - query_vector=query_vector, - limit=limit - ) - - results = [] - for hit in search_result: - results.append({ - "id": hit.id, - "score": hit.score, - "payload": hit.payload - }) - - return { - "result": { - "content": [ - { - "type": "text", - "text": f"搜索结果: {json.dumps(results, ensure_ascii=False)}" - } - ] - } - } - except Exception as e: - logger.error(f"搜索向量时出错: {e}") - return { - "error": { - "code": -32603, - "message": f"搜索向量时出错: {str(e)}" - } - } - - async def _add_vector(self, params: Dict[str, Any]) -> Dict[str, Any]: - """添加向量""" - text = params.get("text", "") - metadata = params.get("metadata", {}) - - # 生成一个简单的ID - import hashlib - vector_id = hashlib.md5(text.encode()).hexdigest() - - # 这里应该使用嵌入模型将文本转换为向量 - # 由于我们没有实际的嵌入模型,这里使用一个简单的模拟 - vector = [0.1] * 1024 # 模拟向量 - - try: - self.client.upsert( - collection_name=self.collection_name, - points=[ - PointStruct( - id=vector_id, - vector=vector, - payload={ - "text": text, - **metadata - } - ) - ] - ) - - return { - "result": { - "content": [ - { - "type": "text", - "text": f"已添加向量,ID: {vector_id}" - } - ] - } - } - except Exception as e: - logger.error(f"添加向量时出错: {e}") - return { - "error": { - "code": -32603, - "message": f"添加向量时出错: {str(e)}" - } - } - - async def _delete_vector(self, params: Dict[str, Any]) -> Dict[str, Any]: - """删除向量""" - vector_id = params.get("id", "") - - try: - self.client.delete( - collection_name=self.collection_name, - points_selector=[vector_id] - ) - - return { - "result": { - "content": [ - { - "type": "text", - "text": f"已删除向量,ID: {vector_id}" - } - ] - } - } - except Exception as e: - logger.error(f"删除向量时出错: {e}") - return { - "error": { - "code": -32603, - "message": f"删除向量时出错: {str(e)}" - } - } - - async def list_resources(self, params: Dict[str, Any]) -> Dict[str, Any]: - """列出资源""" - return { - "result": { - "resources": [] - } - } - - async def read_resource(self, params: Dict[str, Any]) -> Dict[str, Any]: - """读取资源""" - return { - "error": { - "code": -32601, - "message": "不支持读取资源" - } - } - -async def main(): - """主函数""" - server = QdrantMCPServer() - - # 从标准输入读取请求 - for line in sys.stdin: - try: - request = json.loads(line) - response = await server.handle_request(request) - print(json.dumps(response, ensure_ascii=False)) - sys.stdout.flush() - except json.JSONDecodeError as e: - logger.error(f"解析 JSON 时出错: {e}") - error_response = { - "jsonrpc": "2.0", - "id": None, - "error": { - "code": -32700, - "message": f"解析 JSON 时出错: {str(e)}" - } - } - print(json.dumps(error_response, ensure_ascii=False)) - sys.stdout.flush() - except Exception as e: - logger.error(f"处理请求时出错: {e}") - error_response = { - "jsonrpc": "2.0", - "id": None, - "error": { - "code": -32603, - "message": f"内部错误: {str(e)}" - } - } - print(json.dumps(error_response, ensure_ascii=False)) - sys.stdout.flush() - -if __name__ == "__main__": - asyncio.run(main()) \ No newline at end of file diff --git a/scripts/mcp/servers/qdrant-ollama-integration.py b/scripts/mcp/servers/qdrant-ollama-integration.py deleted file mode 100755 index f2af4f3..0000000 --- a/scripts/mcp/servers/qdrant-ollama-integration.py +++ /dev/null @@ -1,117 +0,0 @@ -#!/usr/bin/env python3 -""" -Qdrant 与 Ollama 嵌入模型集成示例 -此脚本演示如何使用 Ollama 作为嵌入模型提供者与 Qdrant 向量数据库集成 -""" - -from langchain_ollama import OllamaEmbeddings -from qdrant_client import QdrantClient -from qdrant_client.models import Distance, VectorParams, PointStruct -import os - -def main(): - # 1. 初始化 Ollama 嵌入模型 - # 使用 nomic-embed-text 模型,这是 Ollama 推荐的嵌入模型 - print("初始化 Ollama 嵌入模型...") - embeddings = OllamaEmbeddings( - model="nomic-embed-text", - base_url="http://localhost:11434" # Ollama 默认地址 - ) - - # 2. 初始化 Qdrant 客户端 - print("连接到 Qdrant 数据库...") - client = QdrantClient( - url="http://localhost:6333", # Qdrant 默认地址 - api_key="313131" # 从之前查看的配置中获取的 API 密钥 - ) - - # 3. 创建集合(如果不存在) - collection_name = "ollama_integration_test" - print(f"创建或检查集合: {collection_name}") - - # 首先检查集合是否已存在 - collections = client.get_collections().collections - collection_exists = any(collection.name == collection_name for collection in collections) - - if not collection_exists: - # 创建新集合 - # 首先获取嵌入模型的维度 - sample_embedding = embeddings.embed_query("sample text") - vector_size = len(sample_embedding) - - client.create_collection( - collection_name=collection_name, - vectors_config=VectorParams( - size=vector_size, - distance=Distance.COSINE - ) - ) - print(f"已创建新集合,向量维度: {vector_size}") - else: - print("集合已存在") - - # 4. 准备示例数据 - documents = [ - "Qdrant 是一个高性能的向量搜索引擎", - "Ollama 是一个本地运行大语言模型的工具", - "向量数据库用于存储和检索高维向量", - "嵌入模型将文本转换为数值向量表示" - ] - - metadata = [ - {"source": "qdrant_docs", "category": "database"}, - {"source": "ollama_docs", "category": "llm"}, - {"source": "vector_db_docs", "category": "database"}, - {"source": "embedding_docs", "category": "ml"} - ] - - # 5. 使用 Ollama 生成嵌入并存储到 Qdrant - print("生成嵌入并存储到 Qdrant...") - points = [] - - for idx, (doc, meta) in enumerate(zip(documents, metadata)): - # 使用 Ollama 生成嵌入 - embedding = embeddings.embed_query(doc) - - # 创建 Qdrant 点 - point = PointStruct( - id=idx, - vector=embedding, - payload={ - "text": doc, - "metadata": meta - } - ) - points.append(point) - - # 上传点到 Qdrant - client.upsert( - collection_name=collection_name, - points=points - ) - print(f"已上传 {len(points)} 个文档到 Qdrant") - - # 6. 执行相似性搜索 - query = "什么是向量数据库?" - print(f"\n执行搜索查询: '{query}'") - - # 使用 Ollama 生成查询嵌入 - query_embedding = embeddings.embed_query(query) - - # 在 Qdrant 中搜索 - search_result = client.search( - collection_name=collection_name, - query_vector=query_embedding, - limit=2 - ) - - # 7. 显示搜索结果 - print("\n搜索结果:") - for i, hit in enumerate(search_result, 1): - print(f"{i}. {hit.payload['text']} (得分: {hit.score:.4f})") - print(f" 元数据: {hit.payload['metadata']}") - - print("\n集成测试完成!") - -if __name__ == "__main__": - main() \ No newline at end of file diff --git a/scripts/mcp/servers/qdrant-ollama-mcp-server.py b/scripts/mcp/servers/qdrant-ollama-mcp-server.py deleted file mode 100755 index 77bb129..0000000 --- a/scripts/mcp/servers/qdrant-ollama-mcp-server.py +++ /dev/null @@ -1,357 +0,0 @@ -#!/usr/bin/env python3 -""" -Qdrant 与 Ollama 嵌入模型集成的 MCP 服务器 -此脚本实现了一个 MCP 服务器,使用 Ollama 作为嵌入模型提供者与 Qdrant 向量数据库集成 -""" - -import asyncio -import json -import os -import sys -from typing import Any, Dict, List, Optional -import logging - -from langchain_ollama import OllamaEmbeddings -from qdrant_client import QdrantClient -from qdrant_client.models import Distance, VectorParams, PointStruct, Filter - -# 设置日志 -logging.basicConfig(level=logging.INFO) -logger = logging.getLogger(__name__) - -class QdrantOllamaMCPServer: - def __init__(self): - # 在初始化之前打印环境变量 - print(f"环境变量:") - print(f"QDRANT_URL: {os.getenv('QDRANT_URL', '未设置')}") - print(f"QDRANT_API_KEY: {os.getenv('QDRANT_API_KEY', '未设置')}") - print(f"OLLAMA_URL: {os.getenv('OLLAMA_URL', '未设置')}") - print(f"OLLAMA_MODEL: {os.getenv('OLLAMA_MODEL', '未设置')}") - print(f"COLLECTION_NAME: {os.getenv('COLLECTION_NAME', '未设置')}") - - # 从环境变量获取配置 - self.qdrant_url = os.getenv("QDRANT_URL", "http://dev1:6333") # dev1服务器上的Qdrant地址 - self.qdrant_api_key = os.getenv("QDRANT_API_KEY", "313131") - self.collection_name = os.getenv("COLLECTION_NAME", "ollama_mcp") - self.ollama_model = os.getenv("OLLAMA_MODEL", "nomic-embed-text") - self.ollama_url = os.getenv("OLLAMA_URL", "http://dev1:11434") # dev1服务器上的Ollama地址 - - # 初始化客户端 - self.embeddings = OllamaEmbeddings( - model=self.ollama_model, - base_url=self.ollama_url - ) - - self.client = QdrantClient( - url=self.qdrant_url, - api_key=self.qdrant_api_key - ) - - # 确保集合存在 - self._ensure_collection_exists() - - logger.info(f"初始化完成,使用集合: {self.collection_name}") - - def _ensure_collection_exists(self): - """确保集合存在,如果不存在则创建""" - collections = self.client.get_collections().collections - collection_exists = any(collection.name == self.collection_name for collection in collections) - - if not collection_exists: - # 获取嵌入模型的维度 - sample_embedding = self.embeddings.embed_query("sample text") - vector_size = len(sample_embedding) - - self.client.create_collection( - collection_name=self.collection_name, - vectors_config=VectorParams( - size=vector_size, - distance=Distance.COSINE - ) - ) - logger.info(f"已创建新集合,向量维度: {vector_size}") - else: - logger.info("集合已存在") - - async def handle_request(self, request: Dict[str, Any]) -> Dict[str, Any]: - """处理 MCP 请求""" - method = request.get("method") - params = request.get("params", {}) - request_id = request.get("id") - - logger.info(f"处理请求: {method}") - - try: - if method == "initialize": - result = { - "protocolVersion": "2024-11-05", - "capabilities": { - "tools": { - "listChanged": True - }, - "resources": { - "subscribe": True, - "listChanged": True - } - }, - "serverInfo": { - "name": "qdrant-ollama-mcp-server", - "version": "1.0.0" - } - } - elif method == "tools/list": - result = { - "tools": [ - { - "name": "add_document", - "description": "添加文档到向量数据库", - "inputSchema": { - "type": "object", - "properties": { - "text": { - "type": "string", - "description": "文档文本内容" - }, - "metadata": { - "type": "object", - "description": "文档的元数据" - } - }, - "required": ["text"] - } - }, - { - "name": "search_documents", - "description": "在向量数据库中搜索相似文档", - "inputSchema": { - "type": "object", - "properties": { - "query": { - "type": "string", - "description": "搜索查询文本" - }, - "limit": { - "type": "integer", - "description": "返回结果数量限制", - "default": 5 - }, - "filter": { - "type": "object", - "description": "搜索过滤器" - } - }, - "required": ["query"] - } - }, - { - "name": "list_collections", - "description": "列出所有集合", - "inputSchema": { - "type": "object", - "properties": {} - } - }, - { - "name": "get_collection_info", - "description": "获取集合信息", - "inputSchema": { - "type": "object", - "properties": { - "collection_name": { - "type": "string", - "description": "集合名称" - } - }, - "required": ["collection_name"] - } - } - ] - } - elif method == "tools/call": - tool_name = params.get("name") - tool_params = params.get("arguments", {}) - - if tool_name == "add_document": - result = await self._add_document(tool_params) - elif tool_name == "search_documents": - result = await self._search_documents(tool_params) - elif tool_name == "list_collections": - result = await self._list_collections(tool_params) - elif tool_name == "get_collection_info": - result = await self._get_collection_info(tool_params) - else: - raise ValueError(f"未知工具: {tool_name}") - else: - raise ValueError(f"未知方法: {method}") - - response = { - "jsonrpc": "2.0", - "id": request_id, - "result": result - } - - except Exception as e: - logger.error(f"处理请求时出错: {e}") - response = { - "jsonrpc": "2.0", - "id": request_id, - "error": { - "code": -1, - "message": str(e) - } - } - - return response - - async def _add_document(self, params: Dict[str, Any]) -> Dict[str, Any]: - """添加文档到向量数据库""" - text = params.get("text") - metadata = params.get("metadata", {}) - - if not text: - raise ValueError("文档文本不能为空") - - # 生成嵌入 - embedding = self.embeddings.embed_query(text) - - # 创建点 - point = PointStruct( - id=hash(text) % (2 ** 31), # 使用文本哈希作为ID - vector=embedding, - payload={ - "text": text, - "metadata": metadata - } - ) - - # 上传到 Qdrant - self.client.upsert( - collection_name=self.collection_name, - points=[point] - ) - - return {"success": True, "message": "文档已添加"} - - async def _search_documents(self, params: Dict[str, Any]) -> Dict[str, Any]: - """在向量数据库中搜索相似文档""" - query = params.get("query") - limit = params.get("limit", 5) - filter_dict = params.get("filter") - - if not query: - raise ValueError("搜索查询不能为空") - - # 生成查询嵌入 - query_embedding = self.embeddings.embed_query(query) - - # 构建过滤器 - search_filter = None - if filter_dict: - search_filter = Filter(**filter_dict) - - # 执行搜索 - search_result = self.client.search( - collection_name=self.collection_name, - query_vector=query_embedding, - limit=limit, - query_filter=search_filter - ) - - # 格式化结果 - results = [] - for hit in search_result: - results.append({ - "text": hit.payload.get("text", ""), - "metadata": hit.payload.get("metadata", {}), - "score": hit.score - }) - - return {"results": results} - - async def _list_collections(self, params: Dict[str, Any]) -> Dict[str, Any]: - """列出所有集合""" - collections = self.client.get_collections().collections - return { - "collections": [ - {"name": collection.name} for collection in collections - ] - } - - async def _get_collection_info(self, params: Dict[str, Any]) -> Dict[str, Any]: - """获取集合信息""" - collection_name = params.get("collection_name") - - if not collection_name: - raise ValueError("集合名称不能为空") - - try: - collection_info = self.client.get_collection(collection_name) - return { - "name": collection_name, - "vectors_count": collection_info.points_count, - "vectors_config": collection_info.config.params.vectors.dict() - } - except Exception as e: - raise ValueError(f"获取集合信息失败: {str(e)}") - - async def run(self): - """运行 MCP 服务器""" - logger.info("启动 Qdrant-Ollama MCP 服务器") - logger.info(f"Qdrant URL: {self.qdrant_url}") - logger.info(f"Ollama URL: {self.ollama_url}") - logger.info(f"Collection: {self.collection_name}") - - # 从标准输入读取请求 - while True: - try: - line = await asyncio.get_event_loop().run_in_executor( - None, sys.stdin.readline - ) - if not line: - break - - logger.info(f"收到请求: {line.strip()}") - - # 解析 JSON 请求 - request = json.loads(line.strip()) - - # 处理请求 - response = await self.handle_request(request) - - # 发送响应 - response_json = json.dumps(response) - print(response_json, flush=True) - logger.info(f"发送响应: {response_json}") - - except json.JSONDecodeError as e: - logger.error(f"JSON 解析错误: {e}") - except Exception as e: - logger.error(f"处理请求时出错: {e}") - except KeyboardInterrupt: - logger.info("服务器被中断") - break - -async def main(): - """主函数""" - # 设置日志级别 - logging.basicConfig( - level=logging.INFO, - format='%(asctime)s - %(name)s - %(levelname)s - %(message)s' - ) - - # 打印环境变量 - print(f"环境变量:") - print(f"QDRANT_URL: {os.getenv('QDRANT_URL', '未设置')}") - print(f"QDRANT_API_KEY: {os.getenv('QDRANT_API_KEY', '未设置')}") - print(f"OLLAMA_URL: {os.getenv('OLLAMA_URL', '未设置')}") - print(f"OLLAMA_MODEL: {os.getenv('OLLAMA_MODEL', '未设置')}") - print(f"COLLECTION_NAME: {os.getenv('COLLECTION_NAME', '未设置')}") - - # 创建服务器实例 - server = QdrantOllamaMCPServer() - - # 运行服务器 - await server.run() - -if __name__ == "__main__": - asyncio.run(main()) \ No newline at end of file diff --git a/scripts/mcp/tools/start-mcp-server.sh b/scripts/mcp/tools/start-mcp-server.sh deleted file mode 100755 index eb9f033..0000000 --- a/scripts/mcp/tools/start-mcp-server.sh +++ /dev/null @@ -1,10 +0,0 @@ -#!/bin/bash -# 设置环境变量 -export QDRANT_URL=http://dev1:6333 -export QDRANT_API_KEY=313131 -export OLLAMA_URL=http://dev1:11434 -export OLLAMA_MODEL=nomic-embed-text -export COLLECTION_NAME=ollama_mcp - -# 启动MCP服务器 -python /home/ben/qdrant/qdrant_ollama_mcp_server.py \ No newline at end of file diff --git a/scripts/register-traefik-to-all-consul.sh b/scripts/register-traefik-to-all-consul.sh new file mode 100755 index 0000000..41dfb08 --- /dev/null +++ b/scripts/register-traefik-to-all-consul.sh @@ -0,0 +1,68 @@ +#!/bin/bash + +# 向所有三个 Consul 节点注册 Traefik 服务 +# 解决 Consul leader 轮换问题 + +CONSUL_NODES=( + "master.tailnet-68f9.ts.net:8500" + "warden.tailnet-68f9.ts.net:8500" + "ash3c.tailnet-68f9.ts.net:8500" +) + +TRAEFIK_IP="100.97.62.111" +ALLOC_ID=$(nomad job allocs traefik-consul-lb | head -2 | tail -1 | awk '{print $1}') + +SERVICE_DATA_LB="{ + \"ID\": \"traefik-consul-lb-${ALLOC_ID}\", + \"Name\": \"consul-lb\", + \"Tags\": [\"consul\", \"loadbalancer\", \"traefik\", \"multi-node\"], + \"Address\": \"${TRAEFIK_IP}\", + \"Port\": 80, + \"Check\": { + \"HTTP\": \"http://${TRAEFIK_IP}:80/\", + \"Interval\": \"30s\", + \"Timeout\": \"15s\" + } +}" + +SERVICE_DATA_DASHBOARD="{ + \"ID\": \"traefik-dashboard-${ALLOC_ID}\", + \"Name\": \"traefik-dashboard\", + \"Tags\": [\"traefik\", \"dashboard\", \"multi-node\"], + \"Address\": \"${TRAEFIK_IP}\", + \"Port\": 8080, + \"Check\": { + \"HTTP\": \"http://${TRAEFIK_IP}:8080/api/overview\", + \"Interval\": \"30s\", + \"Timeout\": \"15s\" + } +}" + +echo "Registering Traefik services to all Consul nodes..." +echo "Allocation ID: ${ALLOC_ID}" +echo "Traefik IP: ${TRAEFIK_IP}" + +for node in "${CONSUL_NODES[@]}"; do + echo "Registering to ${node}..." + + # 注册 consul-lb 服务 + curl -s -X PUT "http://${node}/v1/agent/service/register" \ + -H "Content-Type: application/json" \ + -d "${SERVICE_DATA_LB}" + + # 注册 traefik-dashboard 服务 + curl -s -X PUT "http://${node}/v1/agent/service/register" \ + -H "Content-Type: application/json" \ + -d "${SERVICE_DATA_DASHBOARD}" + + echo "✓ Registered to ${node}" +done + +echo "" +echo "🎉 Services registered to all Consul nodes!" +echo "" +echo "Verification:" +for node in "${CONSUL_NODES[@]}"; do + echo "Services on ${node}:" + curl -s "http://${node}/v1/catalog/services" | jq -r 'keys[]' | grep -E "(consul-lb|traefik-dashboard)" | sed 's/^/ - /' +done diff --git a/scripts/setup/config/generate-consul-config.sh b/scripts/setup/config/generate-consul-config.sh deleted file mode 100755 index 8404e52..0000000 --- a/scripts/setup/config/generate-consul-config.sh +++ /dev/null @@ -1,61 +0,0 @@ -#!/bin/bash - -# Consul配置生成脚本 -# 此脚本使用Consul模板从KV存储生成最终的Consul配置文件 - -set -e - -# 配置参数 -CONSUL_ADDR="${CONSUL_ADDR:-localhost:8500}" -ENVIRONMENT="${ENVIRONMENT:-dev}" -CONSUL_CONFIG_DIR="${CONSUL_CONFIG_DIR:-/root/mgmt/components/consul/configs}" -CONSUL_TEMPLATE_CMD="${CONSUL_TEMPLATE_CMD:-consul-template}" - -echo "开始生成Consul配置文件..." -echo "Consul地址: $CONSUL_ADDR" -echo "环境: $ENVIRONMENT" -echo "配置目录: $CONSUL_CONFIG_DIR" - -# 检查Consul连接 -echo "检查Consul连接..." -if ! curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then - echo "错误: 无法连接到Consul服务器 $CONSUL_ADDR" - exit 1 -fi -echo "Consul连接成功" - -# 检查consul-template是否可用 -if ! command -v $CONSUL_TEMPLATE_CMD &> /dev/null; then - echo "错误: consul-template 命令不可用,请安装consul-template" - exit 1 -fi - -# 设置环境变量 -export CONSUL_ADDR -export ENVIRONMENT - -# 使用consul-template生成配置文件 -echo "使用consul-template生成配置文件..." -$CONSUL_TEMPLATE_CMD \ - -template="$CONSUL_CONFIG_DIR/consul.hcl.tmpl:$CONSUL_CONFIG_DIR/consul.hcl" \ - -once \ - -consul-addr="$CONSUL_ADDR" - -# 验证生成的配置文件 -if [ -f "$CONSUL_CONFIG_DIR/consul.hcl" ]; then - echo "配置文件生成成功: $CONSUL_CONFIG_DIR/consul.hcl" - - # 验证配置文件语法 - echo "验证配置文件语法..." - if consul validate $CONSUL_CONFIG_DIR/consul.hcl; then - echo "配置文件语法验证通过" - else - echo "错误: 配置文件语法验证失败" - exit 1 - fi -else - echo "错误: 配置文件生成失败" - exit 1 -fi - -echo "Consul配置文件生成完成" \ No newline at end of file diff --git a/scripts/setup/config/setup-consul-cluster-variables.sh b/scripts/setup/config/setup-consul-cluster-variables.sh deleted file mode 100755 index 23c5c38..0000000 --- a/scripts/setup/config/setup-consul-cluster-variables.sh +++ /dev/null @@ -1,104 +0,0 @@ -#!/bin/bash - -# Consul变量配置脚本 - 遵循最佳命名规范 -# 此脚本将Consul集群配置存储到Consul KV中,遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式 - -set -e - -# 配置参数 -CONSUL_ADDR="${CONSUL_ADDR:-localhost:8500}" -ENVIRONMENT="${ENVIRONMENT:-dev}" -CONSUL_CONFIG_DIR="${CONSUL_CONFIG_DIR:-/root/mgmt/components/consul/configs}" - -echo "开始配置Consul变量,遵循最佳命名规范..." -echo "Consul地址: $CONSUL_ADDR" -echo "环境: $ENVIRONMENT" - -# 检查Consul连接 -echo "检查Consul连接..." -if ! curl -s "$CONSUL_ADDR/v1/status/leader" | grep -q "."; then - echo "错误: 无法连接到Consul服务器 $CONSUL_ADDR" - exit 1 -fi -echo "Consul连接成功" - -# 创建Consul集群配置变量 -echo "创建Consul集群配置变量..." - -# 基础配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/data_dir" -d "/opt/consul/data" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/raft_dir" -d "/opt/consul/raft" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/datacenter" -d "dc1" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/bootstrap_expect" -d "3" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/log_level" -d "INFO" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/cluster/encrypt_key" -d "YourEncryptionKeyHere" - -# UI配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ui/enabled" -d "true" - -# 网络配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/network/client_addr" -d "0.0.0.0" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/network/bind_interface" -d "eth0" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/network/advertise_interface" -d "eth0" - -# 端口配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/dns" -d "8600" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/http" -d "8500" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/https" -d "-1" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/grpc" -d "8502" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/grpc_tls" -d "8503" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/serf_lan" -d "8301" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/serf_wan" -d "8302" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/ports/server" -d "8300" - -# 节点配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/master/ip" -d "100.117.106.136" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/ash3c/ip" -d "100.116.80.94" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/nodes/warden/ip" -d "100.122.197.112" - -# 服务发现配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/service/enable_script_checks" -d "true" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/service/enable_local_script_checks" -d "true" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/service/enable_service_script" -d "true" - -# 性能配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/performance/raft_multiplier" -d "1" - -# 日志配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/log/enable_syslog" -d "false" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/log/log_file" -d "/var/log/consul/consul.log" - -# 连接配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/connection/reconnect_timeout" -d "30s" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/connection/reconnect_timeout_wan" -d "30s" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/connection/session_ttl_min" -d "10s" - -# Autopilot配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/cleanup_dead_servers" -d "true" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/last_contact_threshold" -d "200ms" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/max_trailing_logs" -d "250" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/server_stabilization_time" -d "10s" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/disable_upgrade_migration" -d "false" -# 添加领导者优先级配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/redundancy_zone_tag_master" -d "vice_president" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/autopilot/redundancy_zone_tag_warden" -d "president" - -# 快照配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/enabled" -d "true" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/interval" -d "24h" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/retain" -d "30" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/snapshot/name" -d "consul-snapshot-{{.Timestamp}}" - -# 备份配置 -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/enabled" -d "true" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/interval" -d "6h" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/retain" -d "7" -curl -X PUT "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/backup/name" -d "consul-backup-{{.Timestamp}}" - -echo "Consul变量配置完成" - -# 验证配置 -echo "验证配置..." -curl -s "$CONSUL_ADDR/v1/kv/config/$ENVIRONMENT/consul/?keys" | jq -r '.[]' | head -10 - -echo "Consul变量配置脚本执行完成" \ No newline at end of file diff --git a/scripts/setup/config/setup-consul-variables-and-storage.sh b/scripts/setup/config/setup-consul-variables-and-storage.sh deleted file mode 100755 index c6656ee..0000000 --- a/scripts/setup/config/setup-consul-variables-and-storage.sh +++ /dev/null @@ -1,261 +0,0 @@ -#!/bin/bash - -# Consul 变量和存储配置脚本 -# 用于增强Consul集群功能 - -set -e - -# 颜色输出 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -NC='\033[0m' # No Color - -# 日志函数 -log_info() { - echo -e "${GREEN}[INFO]${NC} $1" -} - -log_warn() { - echo -e "${YELLOW}[WARN]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 默认Consul地址 -CONSUL_ADDR=${CONSUL_ADDR:-"http://localhost:8500"} - -# 检查Consul连接 -check_consul() { - log_info "检查Consul连接..." - if curl -s "${CONSUL_ADDR}/v1/status/leader" > /dev/null; then - log_info "Consul连接正常" - return 0 - else - log_error "无法连接到Consul: ${CONSUL_ADDR}" - return 1 - fi -} - -# 配置Consul变量 -setup_variables() { - log_info "配置Consul变量..." - - # 环境变量 - ENVIRONMENT=${ENVIRONMENT:-"dev"} - - # 创建基础配置结构 - log_info "创建基础配置结构..." - - # 应用配置 - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/app/name" -d "my-application" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/app/version" -d "1.0.0" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/app/environment" -d "${ENVIRONMENT}" > /dev/null - - # 数据库配置 - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/database/host" -d "db.example.com" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/database/port" -d "5432" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/database/name" -d "myapp_db" > /dev/null - - # 缓存配置 - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/cache/host" -d "redis.example.com" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/cache/port" -d "6379" > /dev/null - - # 消息队列配置 - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/mq/host" -d "mq.example.com" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/mq/port" -d "5672" > /dev/null - - # 特性开关 - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/features/new_ui" -d "true" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT}/features/advanced_analytics" -d "false" > /dev/null - - log_info "Consul变量配置完成" -} - -# 配置Consul存储 -setup_storage() { - log_info "配置Consul存储..." - - # 创建存储配置 - # 注意:这些配置需要在Consul配置文件中启用相应的存储后端 - - # 持久化存储配置 - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/data_dir" -d "/opt/consul/data" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/raft_dir" -d "/opt/consul/raft" > /dev/null - - # 快照配置 - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/snapshot_enabled" -d "true" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/snapshot_interval" -d "24h" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/snapshot_retention" -d "30" > /dev/null - - # 备份配置 - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/backup_enabled" -d "true" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/backup_interval" -d "6h" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/backup_retention" -d "7" > /dev/null - - # 自动清理配置 - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/cleanup_dead_servers" -d "true" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/last_contact_threshold" -d "200ms" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/max_trailing_logs" -d "250" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/server_stabilization_time" -d "10s" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/redundancy_zone_tag" -d "" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/disable_upgrade_migration" -d "false" > /dev/null - curl -s -X PUT "${CONSUL_ADDR}/v1/kv/storage/consul/autopilot/upgrade_version_tag" -d "" > /dev/null - - log_info "Consul存储配置完成" -} - -# 创建Consul配置文件 -create_consul_config() { - log_info "创建Consul配置文件..." - - # 创建配置目录 - mkdir -p /root/mgmt/components/consul/configs - - # 创建基础配置文件 - cat > /root/mgmt/components/consul/configs/consul.hcl << EOF -# Consul 基础配置 -data_dir = "/opt/consul/data" -raft_dir = "/opt/consul/raft" - -# 启用UI -ui_config { - enabled = true -} - -# 数据中心配置 -datacenter = "dc1" - -# 服务器配置 -server = true -bootstrap_expect = 3 - -# 客户端地址 -client_addr = "0.0.0.0" - -# 绑定地址 -bind_addr = "{{ GetInterfaceIP `eth0` }}" - -# 广告地址 -advertise_addr = "{{ GetInterfaceIP `eth0` }}" - -# 端口配置 -ports { - dns = 8600 - http = 8500 - https = -1 - grpc = 8502 - grpc_tls = 8503 - serf_lan = 8301 - serf_wan = 8302 - server = 8300 -} - -# 连接其他节点 -retry_join = ["100.117.106.136", "100.116.80.94", "100.122.197.112"] - -# 启用服务发现 -enable_service_script = true - -# 启用脚本检查 -enable_script_checks = true - -# 启用本地脚本检查 -enable_local_script_checks = true - -# 性能调优 -performance { - raft_multiplier = 1 -} - -# 日志配置 -log_level = "INFO" -enable_syslog = false -log_file = "/var/log/consul/consul.log" - -# 自动加密 -encrypt = "YourEncryptionKeyHere" - -# 重用端口 -reconnect_timeout = "30s" -reconnect_timeout_wan = "30s" - -# 会话TTL -session_ttl_min = "10s" - -# 自动清理 -autopilot { - cleanup_dead_servers = true - last_contact_threshold = "200ms" - max_trailing_logs = 250 - server_stabilization_time = "10s" - redundancy_zone_tag = "" - disable_upgrade_migration = false - upgrade_version_tag = "" -} - -# 快照配置 -snapshot { - enabled = true - interval = "24h" - retain = 30 - name = "consul-snapshot-{{.Timestamp}}" -} - -# 备份配置 -backup { - enabled = true - interval = "6h" - retain = 7 - name = "consul-backup-{{.Timestamp}}" -} -EOF - - log_info "Consul配置文件创建完成: /root/mgmt/components/consul/configs/consul.hcl" -} - -# 显示配置 -show_config() { - log_info "显示Consul变量配置..." - echo "==========================================" - curl -s "${CONSUL_ADDR}/v1/kv/config/${ENVIRONMENT:-dev}/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' - echo "==========================================" - - log_info "显示Consul存储配置..." - echo "==========================================" - curl -s "${CONSUL_ADDR}/v1/kv/storage/?recurse" | jq -r '.[] | "\(.Key): \(.Value | @base64d)"' - echo "==========================================" -} - -# 主函数 -main() { - log_info "开始配置Consul变量和存储..." - - # 检查Consul连接 - check_consul - - # 配置变量 - setup_variables - - # 配置存储 - setup_storage - - # 创建配置文件 - create_consul_config - - # 显示配置 - show_config - - log_info "Consul变量和存储配置完成" - - # 提示下一步 - log_info "下一步操作:" - log_info "1. 重启Consul服务以应用新配置" - log_info "2. 验证配置是否生效" - log_info "3. 根据需要调整配置参数" -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/scripts/setup/environment/setup-environment.sh b/scripts/setup/environment/setup-environment.sh deleted file mode 100755 index 2915dc1..0000000 --- a/scripts/setup/environment/setup-environment.sh +++ /dev/null @@ -1,149 +0,0 @@ -#!/bin/bash - -# 环境设置脚本 -# 用于设置开发环境的必要组件和依赖 - -set -euo pipefail - -# 颜色定义 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -BLUE='\033[0;34m' -NC='\033[0m' # No Color - -# 日志函数 -log_info() { - echo -e "${BLUE}[INFO]${NC} $1" -} - -log_success() { - echo -e "${GREEN}[SUCCESS]${NC} $1" -} - -log_warning() { - echo -e "${YELLOW}[WARNING]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 检查必要的工具 -check_dependencies() { - log_info "检查系统依赖..." - - local deps=("git" "curl" "wget" "jq" "docker" "podman") - local missing_deps=() - - for dep in "${deps[@]}"; do - if ! command -v "$dep" &> /dev/null; then - missing_deps+=("$dep") - fi - done - - if [ ${#missing_deps[@]} -ne 0 ]; then - log_warning "缺少以下依赖: ${missing_deps[*]}" - log_info "请安装缺少的依赖后重新运行" - return 1 - fi - - log_success "所有依赖检查通过" -} - -# 设置环境变量 -setup_environment_variables() { - log_info "设置环境变量..." - - # 创建环境变量文件 - cat > .env << EOF -# 项目环境变量 -PROJECT_ROOT=$(pwd) -SCRIPTS_DIR=\${PROJECT_ROOT}/scripts - -# Vault 配置 -VAULT_ADDR=http://127.0.0.1:8200 -VAULT_DEV_ROOT_TOKEN_ID=myroot - -# Consul 配置 -CONSUL_HTTP_ADDR=http://127.0.0.1:8500 - -# Nomad 配置 -NOMAD_ADDR=http://127.0.0.1:4646 - -# MCP 配置 -MCP_SERVER_PORT=3000 -EOF - - log_success "环境变量文件已创建: .env" -} - -# 创建必要的目录 -create_directories() { - log_info "创建必要的目录..." - - local dirs=( - "logs" - "tmp" - "data" - "backups/vault" - "backups/consul" - "backups/nomad" - ) - - for dir in "${dirs[@]}"; do - mkdir -p "$dir" - log_info "创建目录: $dir" - done - - log_success "目录创建完成" -} - -# 设置脚本权限 -setup_script_permissions() { - log_info "设置脚本执行权限..." - - find scripts/ -name "*.sh" -exec chmod +x {} \; - - log_success "脚本权限设置完成" -} - -# 初始化 Git hooks(如果需要) -setup_git_hooks() { - log_info "设置 Git hooks..." - - if [ -d ".git" ]; then - # 创建 pre-commit hook - cat > .git/hooks/pre-commit << 'EOF' -#!/bin/bash -# 运行基本的代码检查 -echo "运行 pre-commit 检查..." - -# 检查脚本语法 -find scripts/ -name "*.sh" -exec bash -n {} \; || exit 1 - -echo "Pre-commit 检查通过" -EOF - chmod +x .git/hooks/pre-commit - log_success "Git hooks 设置完成" - else - log_warning "不是 Git 仓库,跳过 Git hooks 设置" - fi -} - -# 主函数 -main() { - log_info "开始环境设置..." - - check_dependencies || exit 1 - setup_environment_variables - create_directories - setup_script_permissions - setup_git_hooks - - log_success "环境设置完成!" - log_info "请运行 'source .env' 来加载环境变量" -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/scripts/setup/init/init-vault-cluster.sh b/scripts/setup/init/init-vault-cluster.sh deleted file mode 100755 index 8f8a0e4..0000000 --- a/scripts/setup/init/init-vault-cluster.sh +++ /dev/null @@ -1,122 +0,0 @@ -#!/bin/bash -# Vault集群初始化和解封脚本 - -set -e - -echo "===== Vault集群初始化 =====" - -# 颜色定义 -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -RED='\033[0;31m' -NC='\033[0m' # No Color - -# 函数定义 -log_info() { - echo -e "${GREEN}[INFO]${NC} $1" -} - -log_warn() { - echo -e "${YELLOW}[WARN]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 检查Vault命令是否存在 -if ! command -v vault &> /dev/null; then - log_error "Vault命令未找到,请先安装Vault" - exit 1 -fi - -# 设置Vault地址为master节点 -export VAULT_ADDR='http://100.117.106.136:8200' - -# 等待Vault启动 -log_info "等待Vault启动..." -for i in {1..30}; do - if curl -s "$VAULT_ADDR/v1/sys/health" > /dev/null; then - break - fi - echo -n "." - sleep 2 -done -echo "" - -# 检查Vault是否已初始化 -init_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"initialized":[^,}]*' | cut -d ':' -f2) -if [ "$init_status" = "false" ]; then - log_info "Vault未初始化,正在初始化..." - - # 初始化Vault并保存密钥到安全目录 - vault operator init -key-shares=5 -key-threshold=3 -format=json > /root/mgmt/security/secrets/vault/init_keys.json - - if [ $? -eq 0 ]; then - log_info "Vault初始化成功" - log_warn "重要:请立即将以下文件安全备份并分发给不同管理员" - log_warn "密钥文件位置: /root/mgmt/security/secrets/vault/init_keys.json" - - # 显示关键信息但不显示完整密钥 - unseal_keys_count=$(cat /root/mgmt/security/secrets/vault/init_keys.json | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | wc -l) - root_token=$(cat /root/mgmt/security/secrets/vault/init_keys.json | grep -o '"root_token":"[^"]*"' | cut -d '"' -f4) - - log_info "生成了 $unseal_keys_count 个解封密钥,需要其中任意 3 个来解封Vault" - log_info "根令牌已生成(请安全保管)" - - # 提取解封密钥用于自动解封 - unseal_key1=$(cat /root/mgmt/security/secrets/vault/init_keys.json | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -1) - unseal_key2=$(cat /root/mgmt/security/secrets/vault/init_keys.json | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -2 | tail -1) - unseal_key3=$(cat /root/mgmt/security/secrets/vault/init_keys.json | grep -o '"unseal_keys_b64":\[\([^]]*\)' | sed 's/"unseal_keys_b64":\[//g' | tr ',' '\n' | sed 's/"//g' | head -3 | tail -1) - - # 解封所有节点 - log_info "正在解封所有Vault节点..." - - # 解封master节点 - export VAULT_ADDR='http://100.117.106.136:8200' - vault operator unseal "$unseal_key1" - vault operator unseal "$unseal_key2" - vault operator unseal "$unseal_key3" - - # 解封ash3c节点 - export VAULT_ADDR='http://100.116.80.94:8200' - vault operator unseal "$unseal_key1" - vault operator unseal "$unseal_key2" - vault operator unseal "$unseal_key3" - - # 解封warden节点 - export VAULT_ADDR='http://100.122.197.112:8200' - vault operator unseal "$unseal_key1" - vault operator unseal "$unseal_key2" - vault operator unseal "$unseal_key3" - - log_info "所有Vault节点已成功解封" - log_warn "请确保将密钥文件安全备份到多个位置,并按照安全策略分发给不同管理员" - log_info "根令牌: $root_token" - - # 显示Vault状态 - log_info "Vault集群状态:" - export VAULT_ADDR='http://100.117.106.136:8200' - vault status - else - log_error "Vault初始化失败" - exit 1 - fi -else - log_info "Vault已初始化" - - # 检查Vault是否已解封 - sealed_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"sealed":[^,}]*' | cut -d ':' -f2) - if [ "$sealed_status" = "true" ]; then - log_warn "Vault已初始化但仍处于密封状态,请手动解封" - log_info "使用以下命令解封Vault:" - log_info "export VAULT_ADDR='http://<节点IP>:8200'" - log_info "vault operator unseal <解封密钥1>" - log_info "vault operator unseal <解封密钥2>" - log_info "vault operator unseal <解封密钥3>" - else - log_info "Vault已初始化且已解封,可以正常使用" - fi -fi - -log_info "===== Vault集群初始化和解封完成 =====" \ No newline at end of file diff --git a/scripts/setup/init/init-vault-dev-api.sh b/scripts/setup/init/init-vault-dev-api.sh deleted file mode 100755 index 7c554ce..0000000 --- a/scripts/setup/init/init-vault-dev-api.sh +++ /dev/null @@ -1,129 +0,0 @@ -#!/bin/bash -# 通过API初始化Vault开发环境(无需本地vault命令) - -set -e - -echo "===== 通过API初始化Vault开发环境 =====" - -# 颜色定义 -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -RED='\033[0;31m' -NC='\033[0m' # No Color - -# 函数定义 -log_info() { - echo -e "${GREEN}[INFO]${NC} $1" -} - -log_warn() { - echo -e "${YELLOW}[WARN]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 设置主节点地址 -VAULT_MASTER_ADDR='http://100.117.106.136:8200' - -# 等待Vault启动 -log_info "等待Vault启动..." -for i in {1..30}; do - if curl -s "$VAULT_MASTER_ADDR/v1/sys/health" > /dev/null; then - break - fi - echo -n "." - sleep 2 -done -echo "" - -# 检查Vault是否已初始化 -init_status=$(curl -s "$VAULT_MASTER_ADDR/v1/sys/health" | grep -o '"initialized":[^,}]*' | cut -d ':' -f2) -if [ "$init_status" = "false" ]; then - log_info "Vault未初始化,正在通过API初始化..." - - # 通过API初始化Vault(1个密钥,阈值1) - init_response=$(curl -s -X POST \ - -H "Content-Type: application/json" \ - -d '{ - "secret_shares": 1, - "secret_threshold": 1 - }' \ - "$VAULT_MASTER_ADDR/v1/sys/init") - - # 保存响应到文件 - echo "$init_response" > /root/mgmt/security/secrets/vault/dev/init_keys.json - - if echo "$init_response" | grep -q "keys_base64"; then - log_info "Vault初始化成功(开发模式)" - log_warn "注意:这是开发模式,仅使用1个解封密钥" - log_warn "生产环境请使用5个密钥中的3个阈值" - - # 提取密钥和令牌 - unseal_key=$(echo "$init_response" | grep -o '"keys_base64":\["[^"]*"' | cut -d '"' -f4) - root_token=$(echo "$init_response" | grep -o '"root_token":"[^"]*"' | cut -d '"' -f4) - - log_info "解封密钥: $unseal_key" - log_info "根令牌: $root_token" - - # 自动解封所有节点 - log_info "正在自动解封所有Vault节点..." - - # 解封master节点 - curl -s -X POST \ - -H "Content-Type: application/json" \ - -d "{\"key\": \"$unseal_key\"}" \ - "$VAULT_MASTER_ADDR/v1/sys/unseal" > /dev/null - - # 解封ash3c节点 - curl -s -X POST \ - -H "Content-Type: application/json" \ - -d "{\"key\": \"$unseal_key\"}" \ - "http://100.116.80.94:8200/v1/sys/unseal" > /dev/null - - # 解封warden节点 - curl -s -X POST \ - -H "Content-Type: application/json" \ - -d "{\"key\": \"$unseal_key\"}" \ - "http://100.122.197.112:8200/v1/sys/unseal" > /dev/null - - log_info "所有Vault节点已成功解封" - - # 显示Vault状态 - log_info "Vault集群状态:" - curl -s "$VAULT_MASTER_ADDR/v1/sys/health" | jq . - - # 保存环境变量以便后续使用 - echo "export VAULT_ADDR='$VAULT_MASTER_ADDR'" > /root/mgmt/security/secrets/vault/dev/vault_env.sh - echo "export VAULT_TOKEN='$root_token'" >> /root/mgmt/security/secrets/vault/dev/vault_env.sh - log_info "环境变量已保存到: /root/mgmt/security/secrets/vault/dev/vault_env.sh" - - log_warn "开发环境提示:" - log_warn "1. 请勿在生产环境中使用此配置" - log_warn "2. 生产环境应使用5个密钥中的3个阈值" - log_warn "3. 密钥应分发给不同管理员保管" - else - log_error "Vault初始化失败" - log_error "响应: $init_response" - exit 1 - fi -else - log_info "Vault已初始化" - - # 检查Vault是否已解封 - sealed_status=$(curl -s "$VAULT_MASTER_ADDR/v1/sys/health" | grep -o '"sealed":[^,}]*' | cut -d ':' -f2) - if [ "$sealed_status" = "true" ]; then - log_warn "Vault已初始化但仍处于密封状态" - log_info "请使用API解封:" - log_info "curl -X POST -d '{\"key\": \"<解封密钥>\"}' $VAULT_MASTER_ADDR/v1/sys/unseal" - else - log_info "Vault已初始化且已解封,可以正常使用" - - # 显示Vault状态 - log_info "Vault集群状态:" - curl -s "$VAULT_MASTER_ADDR/v1/sys/health" | jq . - fi -fi - -log_info "===== Vault开发环境初始化完成 =====" \ No newline at end of file diff --git a/scripts/setup/init/init-vault-dev.sh b/scripts/setup/init/init-vault-dev.sh deleted file mode 100755 index b9edbee..0000000 --- a/scripts/setup/init/init-vault-dev.sh +++ /dev/null @@ -1,122 +0,0 @@ -#!/bin/bash -# Vault开发环境初始化脚本 - -set -e - -echo "===== Vault开发环境初始化 =====" - -# 颜色定义 -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -RED='\033[0;31m' -NC='\033[0m' # No Color - -# 函数定义 -log_info() { - echo -e "${GREEN}[INFO]${NC} $1" -} - -log_warn() { - echo -e "${YELLOW}[WARN]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 检查Vault命令是否存在 -if ! command -v vault &> /dev/null; then - log_error "Vault命令未找到,请先安装Vault" - exit 1 -fi - -# 设置Vault地址为master节点 -export VAULT_ADDR='http://100.117.106.136:8200' - -# 等待Vault启动 -log_info "等待Vault启动..." -for i in {1..30}; do - if curl -s "$VAULT_ADDR/v1/sys/health" > /dev/null; then - break - fi - echo -n "." - sleep 2 -done -echo "" - -# 检查Vault是否已初始化 -init_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"initialized":[^,}]*' | cut -d ':' -f2) -if [ "$init_status" = "false" ]; then - log_info "Vault未初始化,正在初始化..." - - # 初始化Vault并保存密钥到开发目录 - vault operator init -key-shares=1 -key-threshold=1 -format=json > /root/mgmt/security/secrets/vault/dev/init_keys.json - - if [ $? -eq 0 ]; then - log_info "Vault初始化成功(开发模式)" - log_warn "注意:这是开发模式,仅使用1个解封密钥" - log_warn "生产环境请使用5个密钥中的3个阈值" - - # 显示密钥信息 - unseal_key=$(cat /root/mgmt/security/secrets/vault/dev/init_keys.json | grep -o '"unseal_keys_b64":\["[^"]*"' | cut -d '"' -f4) - root_token=$(cat /root/mgmt/security/secrets/vault/dev/init_keys.json | grep -o '"root_token":"[^"]*"' | cut -d '"' -f4) - - log_info "解封密钥: $unseal_key" - log_info "根令牌: $root_token" - - # 自动解封所有节点 - log_info "正在自动解封所有Vault节点..." - - # 解封master节点 - export VAULT_ADDR='http://100.117.106.136:8200' - vault operator unseal "$unseal_key" - - # 解封ash3c节点 - export VAULT_ADDR='http://100.116.80.94:8200' - vault operator unseal "$unseal_key" - - # 解封warden节点 - export VAULT_ADDR='http://100.122.197.112:8200' - vault operator unseal "$unseal_key" - - log_info "所有Vault节点已成功解封" - - # 显示Vault状态 - log_info "Vault集群状态:" - export VAULT_ADDR='http://100.117.106.136:8200' - vault status - - # 保存环境变量以便后续使用 - echo "export VAULT_ADDR='http://100.117.106.136:8200'" > /root/mgmt/security/secrets/vault/dev/vault_env.sh - echo "export VAULT_TOKEN='$root_token'" >> /root/mgmt/security/secrets/vault/dev/vault_env.sh - log_info "环境变量已保存到: /root/mgmt/security/secrets/vault/dev/vault_env.sh" - - log_warn "开发环境提示:" - log_warn "1. 请勿在生产环境中使用此配置" - log_warn "2. 生产环境应使用5个密钥中的3个阈值" - log_warn "3. 密钥应分发给不同管理员保管" - else - log_error "Vault初始化失败" - exit 1 - fi -else - log_info "Vault已初始化" - - # 检查Vault是否已解封 - sealed_status=$(curl -s "$VAULT_ADDR/v1/sys/health" | grep -o '"sealed":[^,}]*' | cut -d ':' -f2) - if [ "$sealed_status" = "true" ]; then - log_warn "Vault已初始化但仍处于密封状态" - log_info "请使用以下命令解封:" - log_info "export VAULT_ADDR='http://<节点IP>:8200'" - log_info "vault operator unseal <解封密钥>" - else - log_info "Vault已初始化且已解封,可以正常使用" - - # 显示Vault状态 - log_info "Vault集群状态:" - export VAULT_ADDR='http://100.117.106.136:8200' - vault status - fi -fi - -log_info "===== Vault开发环境初始化完成 =====" \ No newline at end of file diff --git a/scripts/test-consul-apt-install.sh b/scripts/test-consul-apt-install.sh new file mode 100755 index 0000000..c4d4ad3 --- /dev/null +++ b/scripts/test-consul-apt-install.sh @@ -0,0 +1,43 @@ +#!/bin/bash + +# 测试 Consul APT 安装和配置 + +echo "🧪 测试 Consul APT 安装流程" +echo "================================" + +# 测试目标节点 +TEST_NODE="hcp1.tailnet-68f9.ts.net" + +echo "1. 测试 HashiCorp 源配置..." +ssh $TEST_NODE "curl -s https://apt.releases.hashicorp.com/gpg | gpg --dearmor | sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null" + +echo "2. 添加 APT 源..." +ssh $TEST_NODE "echo 'deb [trusted=yes signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main' | sudo tee /etc/apt/sources.list.d/hashicorp.list" + +echo "3. 更新包列表..." +ssh $TEST_NODE "apt update" + +echo "4. 检查可用的 Consul 版本..." +ssh $TEST_NODE "apt-cache policy consul" + +echo "5. 测试安装 Consul..." +ssh $TEST_NODE "apt install -y consul=1.21.5-*" + +if [ $? -eq 0 ]; then + echo "✅ Consul 安装成功" + + echo "6. 验证安装..." + ssh $TEST_NODE "consul version" + ssh $TEST_NODE "which consul" + + echo "7. 检查服务状态..." + ssh $TEST_NODE "systemctl status consul --no-pager" + +else + echo "❌ Consul 安装失败" + exit 1 +fi + +echo "" +echo "🎉 测试完成!" +echo "现在可以运行完整的 Ansible playbook" diff --git a/scripts/testing/infrastructure/test-nomad-config.sh b/scripts/testing/infrastructure/test-nomad-config.sh deleted file mode 100755 index ad2132b..0000000 --- a/scripts/testing/infrastructure/test-nomad-config.sh +++ /dev/null @@ -1,19 +0,0 @@ -#!/bin/bash - -# 测试Nomad配置文件 -CONFIG_FILE=$1 - -if [ -z "$CONFIG_FILE" ]; then - echo "请提供配置文件路径" - exit 1 -fi - -if [ ! -f "$CONFIG_FILE" ]; then - echo "配置文件不存在: $CONFIG_FILE" - exit 1 -fi - -echo "测试配置文件: $CONFIG_FILE" - -# 尝试使用nomad agent命令测试配置 -nomad agent -config="$CONFIG_FILE" -config-test 2>&1 | head -20 \ No newline at end of file diff --git a/scripts/testing/infrastructure/test-traefik-deployment.sh b/scripts/testing/infrastructure/test-traefik-deployment.sh deleted file mode 100755 index 6762610..0000000 --- a/scripts/testing/infrastructure/test-traefik-deployment.sh +++ /dev/null @@ -1,275 +0,0 @@ -#!/bin/bash - -# Traefik部署测试脚本 -# 用于测试Traefik在Nomad集群中的部署和功能 - -set -e - -# 颜色定义 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -NC='\033[0m' # No Color - -# 日志函数 -log_info() { - echo -e "${GREEN}[INFO]${NC} $1" -} - -log_warn() { - echo -e "${YELLOW}[WARN]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 检查Nomad集群状态 -check_nomad_cluster() { - log_info "检查Nomad集群状态..." - - # 使用我们之前创建的领导者发现脚本 - if [ -f "/root/mgmt/scripts/nomad-leader-discovery.sh" ]; then - chmod +x /root/mgmt/scripts/nomad-leader-discovery.sh - LEADER_INFO=$(/root/mgmt/scripts/nomad-leader-discovery.sh -c 2>&1) - log_info "Nomad领导者信息: $LEADER_INFO" - else - log_warn "未找到Nomad领导者发现脚本,使用默认方式检查" - nomad server members 2>/dev/null || log_error "无法连接到Nomad集群" - fi -} - -# 检查Consul集群状态 -check_consul_cluster() { - log_info "检查Consul集群状态..." - - consul members 2>/dev/null || log_error "无法连接到Consul集群" - - # 检查Consul领导者 - CONSUL_LEADER=$(curl -s http://127.0.0.1:8500/v1/status/leader) - if [ -n "$CONSUL_LEADER" ]; then - log_info "Consul领导者: $CONSUL_LEADER" - else - log_error "无法获取Consul领导者信息" - fi -} - -# 部署Traefik -deploy_traefik() { - log_info "部署Traefik..." - - # 检查作业文件是否存在 - if [ ! -f "/root/mgmt/jobs/traefik.nomad" ]; then - log_error "Traefik作业文件不存在: /root/mgmt/jobs/traefik.nomad" - exit 1 - fi - - # 部署作业 - nomad run /root/mgmt/jobs/traefik.nomad - - # 等待部署完成 - log_info "等待Traefik部署完成..." - sleep 10 - - # 检查作业状态 - nomad status traefik -} - -# 检查Traefik状态 -check_traefik_status() { - log_info "检查Traefik状态..." - - # 检查作业状态 - JOB_STATUS=$(nomad job status traefik -json | jq -r '.Status') - if [ "$JOB_STATUS" == "running" ]; then - log_info "Traefik作业状态: $JOB_STATUS" - else - log_error "Traefik作业状态异常: $JOB_STATUS" - return 1 - fi - - # 检查分配状态 - ALLOCATIONS=$(nomad job allocs traefik | tail -n +3 | head -n -1 | awk '{print $1}') - for alloc in $ALLOCATIONS; do - alloc_status=$(nomad alloc status $alloc -json | jq -r '.ClientStatus') - if [ "$alloc_status" == "running" ]; then - log_info "分配 $alloc 状态: $alloc_status" - else - log_error "分配 $alloc 状态异常: $alloc_status" - fi - done - - # 检查服务注册 - log_info "检查Consul中的服务注册..." - consul catalog services | grep traefik && log_info "Traefik服务已注册到Consul" || log_warn "Traefik服务未注册到Consul" -} - -# 测试Traefik功能 -test_traefik_functionality() { - log_info "测试Traefik功能..." - - # 获取Traefik服务地址 - TRAEFIK_ADDR=$(consul catalog service traefik | jq -r '.[0].ServiceAddress' 2>/dev/null) - if [ -z "$TRAEFIK_ADDR" ]; then - log_warn "无法从Consul获取Traefik地址,使用本地地址" - TRAEFIK_ADDR="127.0.0.1" - fi - - # 测试API端点 - log_info "测试Traefik API端点..." - if curl -s http://$TRAEFIK_ADDR:8080/ping > /dev/null; then - log_info "Traefik API端点响应正常" - else - log_error "Traefik API端点无响应" - fi - - # 测试仪表板 - log_info "测试Traefik仪表板..." - if curl -s http://$TRAEFIK_ADDR:8080/dashboard/ > /dev/null; then - log_info "Traefik仪表板可访问" - else - log_error "无法访问Traefik仪表板" - fi - - # 测试HTTP入口点 - log_info "测试HTTP入口点..." - if curl -s -I http://$TRAEFIK_ADDR:80 | grep -q "Location: https://"; then - log_info "HTTP到HTTPS重定向正常工作" - else - log_warn "HTTP到HTTPS重定向可能未正常工作" - fi -} - -# 创建测试服务 -create_test_service() { - log_info "创建测试服务..." - - # 创建一个简单的测试服务作业文件 - cat > /tmp/test-service.nomad << EOF -job "test-web" { - datacenters = ["dc1"] - type = "service" - - group "web" { - count = 1 - - network { - port "http" { - to = 8080 - } - } - - task "nginx" { - driver = "podman" - - config { - image = "nginx:alpine" - ports = ["http"] - } - - resources { - cpu = 100 - memory = 64 - } - - service { - name = "test-web" - port = "http" - tags = [ - "traefik.enable=true", - "traefik.http.routers.test-web.rule=Host(`test-web.service.consul`)", - "traefik.http.routers.test-web.entrypoints=https" - ] - - check { - type = "http" - path = "/" - interval = "10s" - timeout = "2s" - } - } - } - } -} -EOF - - # 部署测试服务 - nomad run /tmp/test-service.nomad - - # 等待服务启动 - sleep 15 - - # 测试服务是否可通过Traefik访问 - log_info "测试服务是否可通过Traefik访问..." - if curl -s -H "Host: test-web.service.consul" http://$TRAEFIK_ADDR:80 | grep -q "Welcome to nginx"; then - log_info "测试服务可通过Traefik正常访问" - else - log_error "无法通过Traefik访问测试服务" - fi -} - -# 清理测试资源 -cleanup_test_resources() { - log_info "清理测试资源..." - - # 停止测试服务 - nomad job stop test-web 2>/dev/null || true - nomad job purge test-web 2>/dev/null || true - - # 停止Traefik - nomad job stop traefik 2>/dev/null || true - nomad job purge traefik 2>/dev/null || true - - # 删除临时文件 - rm -f /tmp/test-service.nomad - - log_info "清理完成" -} - -# 主函数 -main() { - case "${1:-all}" in - "check") - check_nomad_cluster - check_consul_cluster - ;; - "deploy") - deploy_traefik - ;; - "status") - check_traefik_status - ;; - "test") - test_traefik_functionality - ;; - "test-service") - create_test_service - ;; - "cleanup") - cleanup_test_resources - ;; - "all") - check_nomad_cluster - check_consul_cluster - deploy_traefik - check_traefik_status - test_traefik_functionality - create_test_service - log_info "所有测试完成" - ;; - *) - echo "用法: $0 {check|deploy|status|test|test-service|cleanup|all}" - echo " check - 检查集群状态" - echo " deploy - 部署Traefik" - echo " status - 检查Traefik状态" - echo " test - 测试Traefik功能" - echo " test-service - 创建并测试示例服务" - echo " cleanup - 清理测试资源" - echo " all - 执行所有步骤(默认)" - exit 1 - ;; - esac -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/scripts/testing/integration/verify-vault-consul-integration.sh b/scripts/testing/integration/verify-vault-consul-integration.sh deleted file mode 100755 index 3c2aa5f..0000000 --- a/scripts/testing/integration/verify-vault-consul-integration.sh +++ /dev/null @@ -1,117 +0,0 @@ -#!/bin/bash -# 验证Vault与Consul集成状态 - -echo "===== 验证Vault与Consul集成 =====" - -# 颜色定义 -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -RED='\033[0;31m' -NC='\033[0m' # No Color - -# 函数定义 -log_info() { - echo -e "${GREEN}[INFO]${NC} $1" -} - -log_warn() { - echo -e "${YELLOW}[WARN]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 1. 检查Vault状态 -log_info "1. 检查Vault状态" -source /root/mgmt/security/secrets/vault/dev/vault_env.sh -vault_status=$(vault status 2>/dev/null) -if [ $? -eq 0 ]; then - echo "$vault_status" - storage_type=$(echo "$vault_status" | grep "Storage Type" | awk '{print $3}') - if [ "$storage_type" = "consul" ]; then - log_info "✓ Vault正在使用Consul作为存储后端" - else - log_error "✗ Vault未使用Consul作为存储后端" - exit 1 - fi -else - log_error "✗ 无法连接到Vault" - exit 1 -fi - -# 2. 检查Consul集群状态 -log_info "" -log_info "2. 检查Consul集群状态" -consul_members=$(consul members 2>/dev/null) -if [ $? -eq 0 ]; then - echo "$consul_members" - alive_count=$(echo "$consul_members" | grep -c "alive") - if [ "$alive_count" -ge 1 ]; then - log_info "✓ Consul集群正在运行" - else - log_error "✗ Consul集群无活动节点" - fi -else - log_error "✗ 无法连接到Consul" -fi - -# 3. 检查Consul中的Vault数据 -log_info "" -log_info "3. 检查Consul中的Vault数据" -vault_data=$(curl -s http://100.117.106.136:8500/v1/kv/vault/?recurse 2>/dev/null) -if [ $? -eq 0 ] && [ -n "$vault_data" ]; then - keys_count=$(echo "$vault_data" | jq length) - log_info "✓ Consul中存储了 $keys_count 个Vault相关键值对" - - # 显示一些关键的Vault数据 - echo "关键Vault数据键:" - echo "$vault_data" | jq -r '.[].Key' | head -10 -else - log_error "✗ 无法从Consul获取Vault数据" -fi - -# 4. 验证Vault数据读写 -log_info "" -log_info "4. 验证Vault数据读写" -# 写入测试数据 -test_write=$(vault kv put secret/integration-test/test-key test_value="integration_test_$(date +%s)" 2>&1) -if echo "$test_write" | grep -q "Success"; then - log_info "✓ 成功写入测试数据到Vault" - - # 读取测试数据 - test_read=$(vault kv get secret/integration-test/test-key 2>&1) - if echo "$test_read" | grep -q "test_value"; then - log_info "✓ 成功从Vault读取测试数据" - echo "$test_read" - else - log_error "✗ 无法从Vault读取测试数据" - echo "$test_read" - fi - - # 清理测试数据 - vault kv delete secret/integration-test/test-key >/dev/null 2>&1 -else - log_error "✗ 无法写入测试数据到Vault" - echo "$test_write" -fi - -# 5. 检查Vault集群状态 -log_info "" -log_info "5. 检查Vault集群状态" -cluster_status=$(vault operator raft list-peers 2>&1) -if echo "$cluster_status" | grep -q "executable file not found"; then - log_info "✓ 使用Consul存储后端(非Raft存储)" -else - echo "$cluster_status" -fi - -# 6. 总结 -log_info "" -log_info "===== 集成验证总结 =====" -log_info "✓ Vault已成功集成Consul作为存储后端" -log_info "✓ Consul集群正常运行" -log_info "✓ Vault数据已存储在Consul中" -log_info "✓ Vault读写功能正常" - -log_warn "注意:这是开发环境配置,生产环境请遵循安全策略" \ No newline at end of file diff --git a/scripts/testing/mcp/test_direct_search.sh b/scripts/testing/mcp/test_direct_search.sh deleted file mode 100755 index 0b0e099..0000000 --- a/scripts/testing/mcp/test_direct_search.sh +++ /dev/null @@ -1,32 +0,0 @@ -#!/bin/bash - -echo "直接测试search_documents方法..." - -# 创建一个简单的Python脚本来测试search_documents方法 -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && python3 -c \" -import asyncio -import json -import sys -sys.path.append('/home/ben/qdrant') - -from qdrant_ollama_mcp_server import QdrantOllamaMCPServer - -async def test_search(): - server = QdrantOllamaMCPServer() - - # 测试search_documents方法 - params = { - 'query': '人工智能', - 'limit': 3 - } - - try: - result = await server._search_documents(params) - print('搜索结果:', json.dumps(result, indent=2, ensure_ascii=False)) - except Exception as e: - print('搜索错误:', str(e)) - import traceback - traceback.print_exc() - -asyncio.run(test_search()) -\"" \ No newline at end of file diff --git a/scripts/testing/mcp/test_local_mcp_servers.sh b/scripts/testing/mcp/test_local_mcp_servers.sh deleted file mode 100755 index 165f7fb..0000000 --- a/scripts/testing/mcp/test_local_mcp_servers.sh +++ /dev/null @@ -1,61 +0,0 @@ -#!/bin/bash - -# 测试当前环境中的MCP服务器 - -echo "测试当前环境中的MCP服务器..." - -# 检查当前环境中是否有MCP配置 -echo "检查MCP配置..." -if [ -f "/root/.mcp/mcp_settings.json" ]; then - echo "找到MCP配置文件: /root/.mcp/mcp_settings.json" - cat /root/.mcp/mcp_settings.json -else - echo "未找到MCP配置文件: /root/.mcp/mcp_settings.json" -fi - -echo "" -echo "检查.kilocode/mcp.json..." -if [ -f "/root/mgmt/.kilocode/mcp.json" ]; then - echo "找到MCP配置文件: /root/mgmt/.kilocode/mcp.json" - cat /root/mgmt/.kilocode/mcp.json -else - echo "未找到MCP配置文件: /root/mgmt/.kilocode/mcp.json" -fi - -echo "" -echo "检查是否有可用的MCP服务器..." - -# 检查context7服务器 -echo "测试context7服务器..." -echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | nc localhost 8080 2>/dev/null || echo "context7服务器未在本地运行" - -# 检查qdrant服务器 -echo "测试qdrant服务器..." -if [ -f "/root/mgmt/qdrant_mcp_server.py" ]; then - echo "找到qdrant服务器脚本: /root/mgmt/qdrant_mcp_server.py" - # 尝试直接运行服务器并测试 - echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | python3 /root/mgmt/qdrant_mcp_server.py 2>/dev/null || echo "qdrant服务器无法直接运行" -else - echo "未找到qdrant服务器脚本" -fi - -# 检查qdrant-ollama服务器 -echo "测试qdrant-ollama服务器..." -if [ -f "/root/mgmt/qdrant_ollama_mcp_server.py" ]; then - echo "找到qdrant-ollama服务器脚本: /root/mgmt/qdrant_ollama_mcp_server.py" - # 尝试直接运行服务器并测试 - echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | python3 /root/mgmt/qdrant_ollama_mcp_server.py 2>/dev/null || echo "qdrant-ollama服务器无法直接运行" -else - echo "未找到qdrant-ollama服务器脚本" -fi - -echo "" -echo "检查环境变量..." -echo "QDRANT_URL: ${QDRANT_URL:-未设置}" -echo "QDRANT_API_KEY: ${QDRANT_API_KEY:-未设置}" -echo "OLLAMA_URL: ${OLLAMA_URL:-未设置}" -echo "OLLAMA_MODEL: ${OLLAMA_MODEL:-未设置}" -echo "COLLECTION_NAME: ${COLLECTION_NAME:-未设置}" - -echo "" -echo "测试完成。" \ No newline at end of file diff --git a/scripts/testing/mcp/test_mcp_interface.sh b/scripts/testing/mcp/test_mcp_interface.sh deleted file mode 100755 index 767969d..0000000 --- a/scripts/testing/mcp/test_mcp_interface.sh +++ /dev/null @@ -1,21 +0,0 @@ -#!/bin/bash - -# 测试MCP服务器在实际MCP接口中的调用 - -echo "测试Qdrant MCP服务器..." -echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && python qdrant_mcp_server.py" - -echo "" -echo "测试Qdrant-Ollama MCP服务器..." -echo '{"jsonrpc":"2.0","id":1,"method":"tools/list"}' | ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && ./start_mcp_server.sh" - -echo "" -echo "测试Qdrant MCP服务器的搜索功能..." -echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"qdrant_search","arguments":{"query":"测试查询","limit":3}}}' | ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && python qdrant_mcp_server.py" - -echo "" -echo "测试Qdrant-Ollama MCP服务器的搜索功能..." -echo '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"qdrant_search","arguments":{"query":"测试查询","limit":3}}}' | ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && ./start_mcp_server.sh" - -echo "" -echo "测试完成。" \ No newline at end of file diff --git a/scripts/testing/mcp/test_mcp_search_final.sh b/scripts/testing/mcp/test_mcp_search_final.sh deleted file mode 100755 index 475ef18..0000000 --- a/scripts/testing/mcp/test_mcp_search_final.sh +++ /dev/null @@ -1,15 +0,0 @@ -#!/bin/bash - -echo "测试通过MCP接口调用search_documents工具..." - -# 先添加一个文档 -echo "添加测试文档..." -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && echo '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"add_document\",\"arguments\":{\"text\":\"机器学习是人工智能的一个子领域,专注于开发能够从数据中学习的算法。\",\"metadata\":{\"source\":\"test\",\"topic\":\"ML\"}}}}' | ./start_mcp_server.sh" - -echo "" -echo "通过MCP接口搜索文档..." -# 测试search_documents工具(不带filter参数) -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && echo '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"search_documents\",\"arguments\":{\"query\":\"机器学习\",\"limit\":3}}}' | ./start_mcp_server.sh" - -echo "" -echo "测试完成。" \ No newline at end of file diff --git a/scripts/testing/mcp/test_mcp_servers.sh b/scripts/testing/mcp/test_mcp_servers.sh deleted file mode 100755 index 59a0a65..0000000 --- a/scripts/testing/mcp/test_mcp_servers.sh +++ /dev/null @@ -1,13 +0,0 @@ -#!/bin/bash - -# 测试MCP服务器脚本 - -echo "测试Qdrant MCP服务器..." -echo '{"jsonrpc":"2.0","id":1,"method":"initialize"}' | ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && python qdrant_mcp_server.py" - -echo "" -echo "测试Qdrant-Ollama MCP服务器..." -echo '{"jsonrpc":"2.0","id":1,"method":"initialize"}' | ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && ./start_mcp_server.sh" - -echo "" -echo "测试完成。" \ No newline at end of file diff --git a/scripts/testing/mcp/test_mcp_servers_comprehensive.py b/scripts/testing/mcp/test_mcp_servers_comprehensive.py deleted file mode 100755 index cc31747..0000000 --- a/scripts/testing/mcp/test_mcp_servers_comprehensive.py +++ /dev/null @@ -1,158 +0,0 @@ -#!/usr/bin/env python3 -""" -测试MCP服务器的脚本 -""" - -import asyncio -import json -import subprocess -import sys -from typing import Dict, Any, List - -async def test_mcp_server(server_name: str, command: List[str], env: Dict[str, str] = None): - """测试MCP服务器""" - print(f"\n=== 测试 {server_name} 服务器 ===") - - # 设置环境变量 - process_env = {} - if env: - process_env.update(env) - - try: - # 启动服务器进程 - process = await asyncio.create_subprocess_exec( - *command, - stdin=asyncio.subprocess.PIPE, - stdout=asyncio.subprocess.PIPE, - stderr=asyncio.subprocess.PIPE, - env=process_env - ) - - # 初始化请求 - init_request = { - "jsonrpc": "2.0", - "id": 1, - "method": "initialize", - "params": { - "protocolVersion": "2024-11-05", - "capabilities": { - "tools": {} - } - } - } - - # 发送初始化请求 - process.stdin.write((json.dumps(init_request) + "\n").encode()) - await process.stdin.drain() - - # 读取初始化响应 - init_response = await process.stdout.readline() - if init_response: - try: - init_data = json.loads(init_response.decode()) - print(f"初始化响应: {init_data}") - except json.JSONDecodeError: - print(f"初始化响应解析失败: {init_response}") - - # 获取工具列表 - tools_request = { - "jsonrpc": "2.0", - "id": 2, - "method": "tools/list" - } - - # 发送工具列表请求 - process.stdin.write((json.dumps(tools_request) + "\n").encode()) - await process.stdin.drain() - - # 读取工具列表响应 - tools_response = await process.stdout.readline() - if tools_response: - try: - tools_data = json.loads(tools_response.decode()) - print(f"工具列表: {json.dumps(tools_data, indent=2, ensure_ascii=False)}") - - # 如果有搜索工具,测试搜索功能 - if "result" in tools_data and "tools" in tools_data["result"]: - for tool in tools_data["result"]["tools"]: - tool_name = tool.get("name") - if tool_name and ("search" in tool_name or "document" in tool_name): - print(f"\n测试工具: {tool_name}") - - # 测试搜索工具 - search_request = { - "jsonrpc": "2.0", - "id": 3, - "method": "tools/call", - "params": { - "name": tool_name, - "arguments": { - "query": "测试查询", - "limit": 3 - } - } - } - - # 发送搜索请求 - process.stdin.write((json.dumps(search_request) + "\n").encode()) - await process.stdin.drain() - - # 读取搜索响应 - search_response = await process.stdout.readline() - if search_response: - try: - search_data = json.loads(search_response.decode()) - print(f"搜索结果: {json.dumps(search_data, indent=2, ensure_ascii=False)}") - except json.JSONDecodeError: - print(f"搜索响应解析失败: {search_response}") - break - except json.JSONDecodeError: - print(f"工具列表响应解析失败: {tools_response}") - - # 关闭进程 - process.stdin.close() - await process.wait() - - except Exception as e: - print(f"测试 {server_name} 服务器时出错: {e}") - -async def main(): - """主函数""" - print("开始测试MCP服务器...") - - # 测试context7服务器 - await test_mcp_server( - "context7", - ["npx", "-y", "@upstash/context7-mcp"], - {"DEFAULT_MINIMUM_TOKENS": ""} - ) - - # 测试qdrant服务器 - await test_mcp_server( - "qdrant", - ["ssh", "ben@dev1", "cd /home/ben/qdrant && source venv/bin/activate && python qdrant_mcp_server.py"], - { - "QDRANT_URL": "http://dev1:6333", - "QDRANT_API_KEY": "313131", - "COLLECTION_NAME": "mcp", - "EMBEDDING_MODEL": "bge-m3" - } - ) - - # 测试qdrant-ollama服务器 - await test_mcp_server( - "qdrant-ollama", - ["ssh", "ben@dev1", "cd /home/ben/qdrant && source venv/bin/activate && ./start_mcp_server.sh"], - { - "QDRANT_URL": "http://dev1:6333", - "QDRANT_API_KEY": "313131", - "COLLECTION_NAME": "ollama_mcp", - "OLLAMA_MODEL": "nomic-embed-text", - "OLLAMA_URL": "http://dev1:11434" - } - ) - - print("\n所有测试完成。") - -if __name__ == "__main__": - asyncio.run(main()) \ No newline at end of file diff --git a/scripts/testing/mcp/test_mcp_servers_improved.py b/scripts/testing/mcp/test_mcp_servers_improved.py deleted file mode 100755 index aebff54..0000000 --- a/scripts/testing/mcp/test_mcp_servers_improved.py +++ /dev/null @@ -1,198 +0,0 @@ -#!/usr/bin/env python3 -""" -改进的MCP服务器测试脚本 -""" - -import asyncio -import json -import subprocess -import sys -from typing import Dict, Any, List, Optional - -async def test_mcp_server(server_name: str, command: List[str], env: Dict[str, str] = None): - """测试MCP服务器""" - print(f"\n=== 测试 {server_name} 服务器 ===") - - # 设置环境变量 - process_env = {} - if env: - process_env.update(env) - - try: - # 启动服务器进程 - process = await asyncio.create_subprocess_exec( - *command, - stdin=asyncio.subprocess.PIPE, - stdout=asyncio.subprocess.PIPE, - stderr=asyncio.subprocess.PIPE, - env=process_env - ) - - # 读取并忽略所有非JSON输出 - buffer = "" - while True: - line = await process.stdout.readline() - if not line: - break - - line_str = line.decode().strip() - buffer += line_str + "\n" - - # 尝试解析JSON - try: - data = json.loads(line_str) - if "jsonrpc" in data: - print(f"收到JSON响应: {json.dumps(data, indent=2, ensure_ascii=False)}") - break - except json.JSONDecodeError: - # 不是JSON,继续读取 - continue - - # 如果没有找到JSON响应,显示缓冲区内容 - if "jsonrpc" not in locals(): - print(f"未找到JSON响应,原始输出: {buffer}") - return - - # 初始化请求 - init_request = { - "jsonrpc": "2.0", - "id": 1, - "method": "initialize", - "params": { - "protocolVersion": "2024-11-05", - "capabilities": { - "tools": {} - } - } - } - - # 发送初始化请求 - process.stdin.write((json.dumps(init_request) + "\n").encode()) - await process.stdin.drain() - - # 读取初始化响应 - init_response = await read_json_response(process) - if init_response: - print(f"初始化成功") - - # 获取工具列表 - tools_request = { - "jsonrpc": "2.0", - "id": 2, - "method": "tools/list" - } - - # 发送工具列表请求 - process.stdin.write((json.dumps(tools_request) + "\n").encode()) - await process.stdin.drain() - - # 读取工具列表响应 - tools_response = await read_json_response(process) - if tools_response: - print(f"工具列表获取成功") - - # 如果有搜索工具,测试搜索功能 - if "result" in tools_response and "tools" in tools_response["result"]: - for tool in tools_response["result"]["tools"]: - tool_name = tool.get("name") - if tool_name and ("search" in tool_name or "document" in tool_name): - print(f"\n测试工具: {tool_name}") - - # 测试搜索工具 - search_request = { - "jsonrpc": "2.0", - "id": 3, - "method": "tools/call", - "params": { - "name": tool_name, - "arguments": { - "query": "测试查询", - "limit": 3 - } - } - } - - # 发送搜索请求 - process.stdin.write((json.dumps(search_request) + "\n").encode()) - await process.stdin.drain() - - # 读取搜索响应 - search_response = await read_json_response(process) - if search_response: - print(f"搜索测试成功") - if "result" in search_response and "content" in search_response["result"]: - for content in search_response["result"]["content"]: - if content.get("type") == "text": - print(f"搜索结果: {content.get('text', '')[:100]}...") - break - - # 关闭进程 - process.stdin.close() - await process.wait() - - except Exception as e: - print(f"测试 {server_name} 服务器时出错: {e}") - -async def read_json_response(process): - """读取JSON响应""" - buffer = "" - while True: - line = await process.stdout.readline() - if not line: - break - - line_str = line.decode().strip() - buffer += line_str + "\n" - - # 尝试解析JSON - try: - data = json.loads(line_str) - if "jsonrpc" in data: - return data - except json.JSONDecodeError: - # 不是JSON,继续读取 - continue - - # 如果没有找到JSON响应,返回None - return None - -async def main(): - """主函数""" - print("开始测试MCP服务器...") - - # 测试context7服务器 - await test_mcp_server( - "context7", - ["npx", "-y", "@upstash/context7-mcp"], - {"DEFAULT_MINIMUM_TOKENS": ""} - ) - - # 测试qdrant服务器 - await test_mcp_server( - "qdrant", - ["ssh", "ben@dev1", "cd /home/ben/qdrant && source venv/bin/activate && python qdrant_mcp_server.py"], - { - "QDRANT_URL": "http://dev1:6333", - "QDRANT_API_KEY": "313131", - "COLLECTION_NAME": "mcp", - "EMBEDDING_MODEL": "bge-m3" - } - ) - - # 测试qdrant-ollama服务器 - await test_mcp_server( - "qdrant-ollama", - ["ssh", "ben@dev1", "cd /home/ben/qdrant && source venv/bin/activate && ./start_mcp_server.sh"], - { - "QDRANT_URL": "http://dev1:6333", - "QDRANT_API_KEY": "313131", - "COLLECTION_NAME": "ollama_mcp", - "OLLAMA_MODEL": "nomic-embed-text", - "OLLAMA_URL": "http://dev1:11434" - } - ) - - print("\n所有测试完成。") - -if __name__ == "__main__": - asyncio.run(main()) \ No newline at end of file diff --git a/scripts/testing/mcp/test_mcp_servers_simple.py b/scripts/testing/mcp/test_mcp_servers_simple.py deleted file mode 100755 index 6440ed9..0000000 --- a/scripts/testing/mcp/test_mcp_servers_simple.py +++ /dev/null @@ -1,167 +0,0 @@ -#!/usr/bin/env python3 -""" -简化的MCP服务器测试脚本 -""" - -import json -import subprocess -import sys -import time -from typing import Dict, Any, List - -def test_mcp_server(server_name: str, command: List[str], env: Dict[str, str] = None): - """测试MCP服务器""" - print(f"\n=== 测试 {server_name} 服务器 ===") - - # 设置环境变量 - process_env = {} - if env: - process_env.update(env) - - try: - # 启动服务器进程 - process = subprocess.Popen( - command, - stdin=subprocess.PIPE, - stdout=subprocess.PIPE, - stderr=subprocess.PIPE, - env=process_env, - text=True - ) - - # 等待进程启动 - time.sleep(2) - - # 初始化请求 - init_request = { - "jsonrpc": "2.0", - "id": 1, - "method": "initialize", - "params": { - "protocolVersion": "2024-11-05", - "capabilities": { - "tools": {} - } - } - } - - # 发送初始化请求 - process.stdin.write(json.dumps(init_request) + "\n") - process.stdin.flush() - - # 读取初始化响应 - init_response = process.stdout.readline() - if init_response: - try: - init_data = json.loads(init_response.strip()) - print(f"初始化成功: {init_data.get('result', {}).get('serverInfo', {}).get('name', '未知服务器')}") - except json.JSONDecodeError: - print(f"初始化响应解析失败: {init_response}") - - # 获取工具列表 - tools_request = { - "jsonrpc": "2.0", - "id": 2, - "method": "tools/list" - } - - # 发送工具列表请求 - process.stdin.write(json.dumps(tools_request) + "\n") - process.stdin.flush() - - # 读取工具列表响应 - tools_response = process.stdout.readline() - if tools_response: - try: - tools_data = json.loads(tools_response.strip()) - print(f"工具列表获取成功") - - # 如果有搜索工具,测试搜索功能 - if "result" in tools_data and "tools" in tools_data["result"]: - for tool in tools_data["result"]["tools"]: - tool_name = tool.get("name") - if tool_name and ("search" in tool_name or "document" in tool_name): - print(f"\n测试工具: {tool_name}") - - # 测试搜索工具 - search_request = { - "jsonrpc": "2.0", - "id": 3, - "method": "tools/call", - "params": { - "name": tool_name, - "arguments": { - "query": "测试查询", - "limit": 3 - } - } - } - - # 发送搜索请求 - process.stdin.write(json.dumps(search_request) + "\n") - process.stdin.flush() - - # 读取搜索响应 - search_response = process.stdout.readline() - if search_response: - try: - search_data = json.loads(search_response.strip()) - print(f"搜索测试成功") - if "result" in search_data and "content" in search_data["result"]: - for content in search_data["result"]["content"]: - if content.get("type") == "text": - print(f"搜索结果: {content.get('text', '')[:100]}...") - except json.JSONDecodeError: - print(f"搜索响应解析失败: {search_response}") - break - except json.JSONDecodeError: - print(f"工具列表响应解析失败: {tools_response}") - - # 关闭进程 - process.stdin.close() - process.terminate() - process.wait() - - except Exception as e: - print(f"测试 {server_name} 服务器时出错: {e}") - -def main(): - """主函数""" - print("开始测试MCP服务器...") - - # 测试context7服务器 - test_mcp_server( - "context7", - ["npx", "-y", "@upstash/context7-mcp"], - {"DEFAULT_MINIMUM_TOKENS": ""} - ) - - # 测试qdrant服务器 - test_mcp_server( - "qdrant", - ["ssh", "ben@dev1", "cd /home/ben/qdrant && source venv/bin/activate && python qdrant_mcp_server.py"], - { - "QDRANT_URL": "http://dev1:6333", - "QDRANT_API_KEY": "313131", - "COLLECTION_NAME": "mcp", - "EMBEDDING_MODEL": "bge-m3" - } - ) - - # 测试qdrant-ollama服务器 - test_mcp_server( - "qdrant-ollama", - ["ssh", "ben@dev1", "cd /home/ben/qdrant && source venv/bin/activate && ./start_mcp_server.sh"], - { - "QDRANT_URL": "http://dev1:6333", - "QDRANT_API_KEY": "313131", - "COLLECTION_NAME": "ollama_mcp", - "OLLAMA_MODEL": "nomic-embed-text", - "OLLAMA_URL": "http://dev1:11434" - } - ) - - print("\n所有测试完成。") - -if __name__ == "__main__": - main() \ No newline at end of file diff --git a/scripts/testing/mcp/test_qdrant_ollama_server.py b/scripts/testing/mcp/test_qdrant_ollama_server.py deleted file mode 100755 index 90b77a7..0000000 --- a/scripts/testing/mcp/test_qdrant_ollama_server.py +++ /dev/null @@ -1,189 +0,0 @@ -#!/usr/bin/env python3 -""" -专门测试qdrant-ollama服务器的脚本 -""" - -import json -import subprocess -import sys -import time -from typing import Dict, Any, List - -def test_qdrant_ollama_server(): - """测试qdrant-ollama服务器""" - print("\n=== 测试 qdrant-ollama 服务器 ===") - - try: - # 启动服务器进程 - process = subprocess.Popen( - ["ssh", "ben@dev1", "cd /home/ben/qdrant && source venv/bin/activate && ./start_mcp_server.sh"], - stdin=subprocess.PIPE, - stdout=subprocess.PIPE, - stderr=subprocess.PIPE, - text=True - ) - - # 读取并忽略所有非JSON输出 - buffer = "" - json_found = False - - # 等待进程启动并读取初始输出 - for _ in range(10): # 最多尝试10次 - line = process.stdout.readline() - if not line: - time.sleep(0.5) - continue - - line = line.strip() - buffer += line + "\n" - - # 尝试解析JSON - try: - data = json.loads(line) - if "jsonrpc" in data: - json_found = True - print(f"收到JSON响应: {json.dumps(data, indent=2, ensure_ascii=False)}") - break - except json.JSONDecodeError: - # 不是JSON,继续读取 - continue - - if not json_found: - print(f"未找到JSON响应,原始输出: {buffer}") - process.terminate() - process.wait() - return - - # 初始化请求 - init_request = { - "jsonrpc": "2.0", - "id": 1, - "method": "initialize", - "params": { - "protocolVersion": "2024-11-05", - "capabilities": { - "tools": {} - } - } - } - - # 发送初始化请求 - process.stdin.write(json.dumps(init_request) + "\n") - process.stdin.flush() - - # 读取初始化响应 - init_response = process.stdout.readline() - if init_response: - try: - init_data = json.loads(init_response.strip()) - print(f"初始化成功: {init_data.get('result', {}).get('serverInfo', {}).get('name', '未知服务器')}") - except json.JSONDecodeError: - print(f"初始化响应解析失败: {init_response}") - - # 获取工具列表 - tools_request = { - "jsonrpc": "2.0", - "id": 2, - "method": "tools/list" - } - - # 发送工具列表请求 - process.stdin.write(json.dumps(tools_request) + "\n") - process.stdin.flush() - - # 读取工具列表响应 - tools_response = process.stdout.readline() - if tools_response: - try: - tools_data = json.loads(tools_response.strip()) - print(f"工具列表获取成功") - - # 如果有搜索工具,测试搜索功能 - if "result" in tools_data and "tools" in tools_data["result"]: - for tool in tools_data["result"]["tools"]: - tool_name = tool.get("name") - if tool_name and ("search" in tool_name or "document" in tool_name): - print(f"\n测试工具: {tool_name}") - - # 先添加一个文档 - add_request = { - "jsonrpc": "2.0", - "id": 3, - "method": "tools/call", - "params": { - "name": "add_document", - "arguments": { - "text": "这是一个测试文档,用于验证qdrant-ollama服务器的功能。", - "metadata": { - "source": "test", - "topic": "测试" - } - } - } - } - - # 发送添加文档请求 - process.stdin.write(json.dumps(add_request) + "\n") - process.stdin.flush() - - # 读取添加文档响应 - add_response = process.stdout.readline() - if add_response: - try: - add_data = json.loads(add_response.strip()) - print(f"添加文档测试成功") - except json.JSONDecodeError: - print(f"添加文档响应解析失败: {add_response}") - - # 测试搜索工具 - search_request = { - "jsonrpc": "2.0", - "id": 4, - "method": "tools/call", - "params": { - "name": tool_name, - "arguments": { - "query": "测试文档", - "limit": 3 - } - } - } - - # 发送搜索请求 - process.stdin.write(json.dumps(search_request) + "\n") - process.stdin.flush() - - # 读取搜索响应 - search_response = process.stdout.readline() - if search_response: - try: - search_data = json.loads(search_response.strip()) - print(f"搜索测试成功") - if "result" in search_data and "content" in search_data["result"]: - for content in search_data["result"]["content"]: - if content.get("type") == "text": - print(f"搜索结果: {content.get('text', '')[:100]}...") - except json.JSONDecodeError: - print(f"搜索响应解析失败: {search_response}") - break - except json.JSONDecodeError: - print(f"工具列表响应解析失败: {tools_response}") - - # 关闭进程 - process.stdin.close() - process.terminate() - process.wait() - - except Exception as e: - print(f"测试 qdrant-ollama 服务器时出错: {e}") - -def main(): - """主函数""" - print("开始测试qdrant-ollama服务器...") - - test_qdrant_ollama_server() - - print("\n测试完成。") - -if __name__ == "__main__": - main() \ No newline at end of file diff --git a/scripts/testing/mcp/test_qdrant_ollama_tools.sh b/scripts/testing/mcp/test_qdrant_ollama_tools.sh deleted file mode 100755 index 10f2917..0000000 --- a/scripts/testing/mcp/test_qdrant_ollama_tools.sh +++ /dev/null @@ -1,15 +0,0 @@ -#!/bin/bash - -echo "测试Qdrant-Ollama MCP服务器的search_documents工具..." - -# 测试search_documents工具 -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && echo '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"search_documents\",\"arguments\":{\"query\":\"测试查询\",\"limit\":3}}}' | ./start_mcp_server.sh" - -echo "" -echo "测试Qdrant-Ollama MCP服务器的add_document工具..." - -# 测试add_document工具 -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && echo '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"add_document\",\"arguments\":{\"text\":\"这是一个测试文档\",\"metadata\":{\"source\":\"test\"}}}}' | ./start_mcp_server.sh" - -echo "" -echo "测试完成。" \ No newline at end of file diff --git a/scripts/testing/mcp/test_qdrant_ollama_tools_fixed.sh b/scripts/testing/mcp/test_qdrant_ollama_tools_fixed.sh deleted file mode 100755 index 6ab3082..0000000 --- a/scripts/testing/mcp/test_qdrant_ollama_tools_fixed.sh +++ /dev/null @@ -1,21 +0,0 @@ -#!/bin/bash - -echo "测试Qdrant-Ollama MCP服务器的search_documents工具(不带filter参数)..." - -# 测试search_documents工具(不带filter参数) -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && echo '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"search_documents\",\"arguments\":{\"query\":\"测试查询\",\"limit\":3}}}' | ./start_mcp_server.sh" - -echo "" -echo "测试Qdrant-Ollama MCP服务器的add_document工具..." - -# 测试add_document工具 -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && echo '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"add_document\",\"arguments\":{\"text\":\"这是一个测试文档\",\"metadata\":{\"source\":\"test\"}}}}' | ./start_mcp_server.sh" - -echo "" -echo "测试Qdrant-Ollama MCP服务器的list_collections工具..." - -# 测试list_collections工具 -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && echo '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"list_collections\",\"arguments\":{}}}' | ./start_mcp_server.sh" - -echo "" -echo "测试完成。" \ No newline at end of file diff --git a/scripts/testing/mcp/test_search_documents.sh b/scripts/testing/mcp/test_search_documents.sh deleted file mode 100755 index b501f35..0000000 --- a/scripts/testing/mcp/test_search_documents.sh +++ /dev/null @@ -1,15 +0,0 @@ -#!/bin/bash - -echo "测试Qdrant-Ollama MCP服务器的search_documents工具(不带filter参数)..." - -# 先添加一个文档 -echo "添加测试文档..." -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && echo '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"add_document\",\"arguments\":{\"text\":\"人工智能是计算机科学的一个分支,致力于创建能够执行通常需要人类智能的任务的系统。\",\"metadata\":{\"source\":\"test\",\"topic\":\"AI\"}}}}' | ./start_mcp_server.sh" - -echo "" -echo "搜索文档..." -# 测试search_documents工具(不带filter参数) -ssh ben@dev1 "cd /home/ben/qdrant && source venv/bin/activate && echo '{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"tools/call\",\"params\":{\"name\":\"search_documents\",\"arguments\":{\"query\":\"人工智能\",\"limit\":3}}}' | ./start_mcp_server.sh" - -echo "" -echo "测试完成。" \ No newline at end of file diff --git a/scripts/testing/run_all_tests.sh b/scripts/testing/run_all_tests.sh deleted file mode 100755 index 2ad5493..0000000 --- a/scripts/testing/run_all_tests.sh +++ /dev/null @@ -1,116 +0,0 @@ -#!/bin/bash - -# MCP服务器测试运行器 -# 自动运行所有MCP服务器测试脚本 - -set -e - -# 颜色定义 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -NC='\033[0m' # No Color - -# 测试目录 -TEST_DIR="/root/mgmt/scripts/testing/mcp" -REPORT_FILE="/root/mgmt/scripts/testing/test_results_$(date +%Y%m%d_%H%M%S).md" - -# 检查测试目录是否存在 -if [ ! -d "$TEST_DIR" ]; then - echo -e "${RED}错误: 测试目录 $TEST_DIR 不存在${NC}" - exit 1 -fi - -# 创建测试报告头部 -cat > "$REPORT_FILE" << EOF -# MCP服务器测试报告 - $(date '+%Y-%m-%d %H:%M:%S') - -## 测试环境 -- 测试时间: $(date '+%Y-%m-%d %H:%M:%S') -- 测试目录: $TEST_DIR -- 测试类型: 自动化批量测试 - -## 测试结果概览 - -EOF - -echo -e "${YELLOW}开始运行MCP服务器测试套件...${NC}" -echo -e "${YELLOW}测试报告将保存到: $REPORT_FILE${NC}\n" - -# 测试计数器 -TOTAL_TESTS=0 -PASSED_TESTS=0 -FAILED_TESTS=0 - -# 运行Shell脚本测试 -echo -e "${YELLOW}运行Shell脚本测试...${NC}" -for test_script in "$TEST_DIR"/*.sh; do - if [ -f "$test_script" ]; then - TEST_NAME=$(basename "$test_script") - echo -e "${YELLOW}运行测试: $TEST_NAME${NC}" - - # 运行测试脚本 - if bash "$test_script" >> "$REPORT_FILE" 2>&1; then - echo -e "${GREEN}✅ $TEST_NAME 通过${NC}" - echo "- ✅ $TEST_NAME: 通过" >> "$REPORT_FILE" - ((PASSED_TESTS++)) - else - echo -e "${RED}❌ $TEST_NAME 失败${NC}" - echo "- ❌ $TEST_NAME: 失败" >> "$REPORT_FILE" - ((FAILED_TESTS++)) - fi - ((TOTAL_TESTS++)) - echo - fi -done - -# 运行Python脚本测试 -echo -e "${YELLOW}运行Python脚本测试...${NC}" -for test_script in "$TEST_DIR"/*.py; do - if [ -f "$test_script" ]; then - TEST_NAME=$(basename "$test_script") - echo -e "${YELLOW}运行测试: $TEST_NAME${NC}" - - # 运行Python测试 - if python3 "$test_script" >> "$REPORT_FILE" 2>&1; then - echo -e "${GREEN}✅ $TEST_NAME 通过${NC}" - echo "- ✅ $TEST_NAME: 通过" >> "$REPORT_FILE" - ((PASSED_TESTS++)) - else - echo -e "${RED}❌ $TEST_NAME 失败${NC}" - echo "- ❌ $TEST_NAME: 失败" >> "$REPORT_FILE" - ((FAILED_TESTS++)) - fi - ((TOTAL_TESTS++)) - echo - fi -done - -# 更新测试报告 -cat >> "$REPORT_FILE" << EOF - -## 测试统计 -- 总测试数: $TOTAL_TESTS -- 通过测试: $PASSED_TESTS -- 失败测试: $FAILED_TESTS -- 通过率: $((PASSED_TESTS * 100 / TOTAL_TESTS))% - -## 详细测试输出 -EOF - -# 显示测试结果摘要 -echo -e "\n${YELLOW}=== 测试完成 ===${NC}" -echo -e "总测试数: $TOTAL_TESTS" -echo -e "通过测试: ${GREEN}$PASSED_TESTS${NC}" -echo -e "失败测试: ${RED}$FAILED_TESTS${NC}" -echo -e "通过率: $((PASSED_TESTS * 100 / TOTAL_TESTS))%" -echo -e "详细报告: $REPORT_FILE" - -# 如果所有测试都通过,返回成功 -if [ $FAILED_TESTS -eq 0 ]; then - echo -e "\n${GREEN}所有测试均通过!${NC}" - exit 0 -else - echo -e "\n${RED}部分测试失败,请查看详细报告。${NC}" - exit 1 -fi \ No newline at end of file diff --git a/scripts/testing/test-runner.sh b/scripts/testing/test-runner.sh deleted file mode 100755 index c099d37..0000000 --- a/scripts/testing/test-runner.sh +++ /dev/null @@ -1,35 +0,0 @@ -#!/bin/bash - -# 项目测试快速执行脚本 -# 从项目根目录快速运行所有MCP服务器测试 - -set -e - -# 颜色定义 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -NC='\033[0m' # No Color - -# 获取脚本所在目录 -SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" -TEST_DIR="$SCRIPT_DIR/tests" - -# 检查测试目录是否存在 -if [ ! -d "$TEST_DIR" ]; then - echo -e "${RED}错误: 测试目录 $TEST_DIR 不存在${NC}" - exit 1 -fi - -# 检查测试运行器脚本是否存在 -RUNNER_SCRIPT="$TEST_DIR/run_all_tests.sh" -if [ ! -f "$RUNNER_SCRIPT" ]; then - echo -e "${RED}错误: 测试运行器脚本 $RUNNER_SCRIPT 不存在${NC}" - exit 1 -fi - -echo -e "${YELLOW}运行MCP服务器测试套件...${NC}" -echo -e "${YELLOW}测试目录: $TEST_DIR${NC}\n" - -# 运行测试 -exec "$RUNNER_SCRIPT" \ No newline at end of file diff --git a/scripts/utilities/backup/backup-all.sh b/scripts/utilities/backup/backup-all.sh deleted file mode 100755 index 70a7685..0000000 --- a/scripts/utilities/backup/backup-all.sh +++ /dev/null @@ -1,233 +0,0 @@ -#!/bin/bash - -# 全量备份脚本 -# 备份所有重要的配置和数据 - -set -euo pipefail - -# 颜色定义 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -BLUE='\033[0;34m' -NC='\033[0m' # No Color - -# 配置 -BACKUP_DIR="backups/$(date +%Y%m%d_%H%M%S)" -PROJECT_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/../../../" && pwd)" - -# 日志函数 -log_info() { - echo -e "${BLUE}[INFO]${NC} $1" -} - -log_success() { - echo -e "${GREEN}[SUCCESS]${NC} $1" -} - -log_warning() { - echo -e "${YELLOW}[WARNING]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 创建备份目录 -create_backup_dir() { - log_info "创建备份目录: $BACKUP_DIR" - mkdir -p "$BACKUP_DIR" -} - -# 备份配置文件 -backup_configs() { - log_info "备份配置文件..." - - local config_dirs=( - "components" - "infrastructure/configs" - "security" - "deployment/ansible" - "deployment/terraform" - ) - - for dir in "${config_dirs[@]}"; do - if [ -d "$dir" ]; then - log_info "备份 $dir" - cp -r "$dir" "$BACKUP_DIR/" - else - log_warning "目录不存在: $dir" - fi - done -} - -# 备份脚本 -backup_scripts() { - log_info "备份脚本..." - cp -r scripts "$BACKUP_DIR/" -} - -# 备份环境文件 -backup_env_files() { - log_info "备份环境文件..." - - local env_files=( - ".env" - "mcp_shared_config.json" - "hosts_inventory" - "Makefile" - ) - - for file in "${env_files[@]}"; do - if [ -f "$file" ]; then - log_info "备份 $file" - cp "$file" "$BACKUP_DIR/" - else - log_warning "文件不存在: $file" - fi - done -} - -# 备份 Vault 数据(如果运行中) -backup_vault() { - log_info "检查 Vault 状态..." - - if command -v vault &> /dev/null && vault status &> /dev/null; then - log_info "备份 Vault 数据..." - mkdir -p "$BACKUP_DIR/vault" - - # 备份 Vault 策略 - vault policy list > "$BACKUP_DIR/vault/policies.txt" 2>/dev/null || true - - # 备份 Vault 秘密引擎 - vault secrets list -format=json > "$BACKUP_DIR/vault/secrets_engines.json" 2>/dev/null || true - - log_success "Vault 数据备份完成" - else - log_warning "Vault 未运行或不可访问,跳过 Vault 备份" - fi -} - -# 备份 Consul 数据(如果运行中) -backup_consul() { - log_info "检查 Consul 状态..." - - if command -v consul &> /dev/null && consul members &> /dev/null; then - log_info "备份 Consul 数据..." - mkdir -p "$BACKUP_DIR/consul" - - # 备份 Consul KV 存储 - consul kv export > "$BACKUP_DIR/consul/kv_export.json" 2>/dev/null || true - - # 备份 Consul 服务 - consul catalog services -format=json > "$BACKUP_DIR/consul/services.json" 2>/dev/null || true - - log_success "Consul 数据备份完成" - else - log_warning "Consul 未运行或不可访问,跳过 Consul 备份" - fi -} - -# 创建备份清单 -create_manifest() { - log_info "创建备份清单..." - - cat > "$BACKUP_DIR/MANIFEST.md" << EOF -# 备份清单 - -**备份时间**: $(date) -**备份目录**: $BACKUP_DIR -**项目根目录**: $PROJECT_ROOT - -## 备份内容 - -### 配置文件 -- components/ - 组件配置 -- infrastructure/configs/ - 基础设施配置 -- security/ - 安全配置 -- deployment/ - 部署配置 - -### 脚本文件 -- scripts/ - 所有项目脚本 - -### 环境文件 -- .env - 环境变量 -- mcp_shared_config.json - MCP 配置 -- hosts_inventory - 主机清单 -- Makefile - 构建配置 - -### 服务数据 -- vault/ - Vault 数据(如果可用) -- consul/ - Consul 数据(如果可用) - -## 恢复说明 - -1. 解压备份文件到项目目录 -2. 恢复环境变量: \`source .env\` -3. 重新设置脚本权限: \`find scripts/ -name "*.sh" -exec chmod +x {} \\;\` -4. 根据需要恢复服务数据 - -## 备份统计 - -**总文件数**: $(find "$BACKUP_DIR" -type f | wc -l) -**总大小**: $(du -sh "$BACKUP_DIR" | cut -f1) -EOF - - log_success "备份清单创建完成" -} - -# 压缩备份 -compress_backup() { - log_info "压缩备份..." - - local archive_name="backup_$(basename "$BACKUP_DIR").tar.gz" - tar -czf "$archive_name" -C "$(dirname "$BACKUP_DIR")" "$(basename "$BACKUP_DIR")" - - log_success "备份已压缩: $archive_name" - log_info "备份大小: $(du -sh "$archive_name" | cut -f1)" - - # 可选:删除未压缩的备份目录 - read -p "是否删除未压缩的备份目录? (y/N): " -n 1 -r - echo - if [[ $REPLY =~ ^[Yy]$ ]]; then - rm -rf "$BACKUP_DIR" - log_info "未压缩的备份目录已删除" - fi -} - -# 清理旧备份 -cleanup_old_backups() { - log_info "清理旧备份..." - - # 保留最近的5个备份 - local backup_count=$(ls -1 backup_*.tar.gz 2>/dev/null | wc -l) - if [ "$backup_count" -gt 5 ]; then - log_info "发现 $backup_count 个备份,保留最新的5个" - ls -1t backup_*.tar.gz | tail -n +6 | xargs rm -f - log_success "旧备份清理完成" - else - log_info "备份数量未超过限制,无需清理" - fi -} - -# 主函数 -main() { - log_info "开始全量备份..." - - cd "$PROJECT_ROOT" - - create_backup_dir - backup_configs - backup_scripts - backup_env_files - backup_vault - backup_consul - create_manifest - compress_backup - cleanup_old_backups - - log_success "全量备份完成!" -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/scripts/utilities/backup/backup-consul.sh b/scripts/utilities/backup/backup-consul.sh deleted file mode 100755 index 3d13632..0000000 --- a/scripts/utilities/backup/backup-consul.sh +++ /dev/null @@ -1,133 +0,0 @@ -#!/bin/bash - -# Consul备份脚本 -# 此脚本用于创建Consul的快照备份,并管理备份文件 - -set -e - -# 配置参数 -CONSUL_ADDR=${CONSUL_ADDR:-"http://localhost:8500"} -BACKUP_DIR=${BACKUP_DIR:-"/backups/consul"} -RETAIN_DAYS=${RETAIN_DAYS:-7} -DATE=$(date +%Y%m%d_%H%M%S) - -# 创建备份目录 -mkdir -p "$BACKUP_DIR" - -echo "Consul备份脚本" -echo "===============" -echo "Consul地址: $CONSUL_ADDR" -echo "备份目录: $BACKUP_DIR" -echo "保留天数: $RETAIN_DAYS" -echo "备份时间: $DATE" -echo "" - -# 检查Consul连接 -check_consul_connection() { - echo "检查Consul连接..." - if curl -s "$CONSUL_ADDR/v1/status/leader" > /dev/null; then - echo "✓ Consul连接正常" - else - echo "✗ 无法连接到Consul,请检查Consul服务是否运行" - exit 1 - fi -} - -# 创建快照备份 -create_snapshot() { - echo "创建Consul快照备份..." - - SNAPSHOT_FILE="${BACKUP_DIR}/consul-snapshot-${DATE}.snap" - - # 使用Consul API创建快照 - if curl -s "${CONSUL_ADDR}/v1/snapshot" > "$SNAPSHOT_FILE"; then - echo "✓ 快照备份创建成功: $SNAPSHOT_FILE" - - # 显示快照信息 - echo "快照信息:" - consul snapshot inspect "$SNAPSHOT_FILE" 2>/dev/null || echo " (需要安装consul客户端以查看快照信息)" - else - echo "✗ 快照备份创建失败" - exit 1 - fi -} - -# 清理旧备份 -cleanup_old_backups() { - echo "清理${RETAIN_DAYS}天前的备份..." - - # 查找并删除旧备份文件 - if find "$BACKUP_DIR" -name "consul-snapshot-*.snap" -mtime +$RETAIN_DAYS -delete; then - echo "✓ 旧备份清理完成" - else - echo " 没有找到需要清理的旧备份" - fi -} - -# 列出所有备份 -list_backups() { - echo "" - echo "当前备份列表:" - echo "=============" - - if [ -d "$BACKUP_DIR" ] && [ "$(ls -A "$BACKUP_DIR")" ]; then - ls -lah "$BACKUP_DIR"/consul-snapshot-*.snap | awk '{print $5, $6, $7, $8, $9}' - else - echo " 没有找到备份文件" - fi -} - -# 验证备份 -verify_backup() { - echo "" - echo "验证备份..." - - LATEST_BACKUP=$(ls -t "$BACKUP_DIR"/consul-snapshot-*.snap | head -n 1) - - if [ -n "$LATEST_BACKUP" ]; then - echo "验证最新备份: $LATEST_BACKUP" - - # 检查文件大小 - FILE_SIZE=$(du -h "$LATEST_BACKUP" | cut -f1) - echo "备份文件大小: $FILE_SIZE" - - # 检查文件是否为空 - if [ -s "$LATEST_BACKUP" ]; then - echo "✓ 备份文件不为空" - else - echo "✗ 备份文件为空" - exit 1 - fi - - # 尝试检查快照元数据 - if consul snapshot inspect "$LATEST_BACKUP" > /dev/null 2>&1; then - echo "✓ 备份文件格式正确" - else - echo "✗ 备份文件格式错误" - exit 1 - fi - else - echo "✗ 没有找到备份文件" - exit 1 - fi -} - -# 主函数 -main() { - check_consul_connection - create_snapshot - cleanup_old_backups - list_backups - verify_backup - - echo "" - echo "✓ 备份流程完成!" - echo "" - echo "使用说明:" - echo "1. 可以通过cron定期运行此脚本: 0 2 * * * /path/to/backup_consul.sh" - echo "2. 恢复备份使用: consul snapshot restore /path/to/consul-snapshot-YYYYMMDD_HHMMSS.snap" - echo "3. 查看备份内容: consul snapshot inspect /path/to/consul-snapshot-YYYYMMDD_HHMMSS.snap" -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/scripts/utilities/helpers/fix-alpine-cgroups-systemd.sh b/scripts/utilities/helpers/fix-alpine-cgroups-systemd.sh deleted file mode 100755 index 0c2849f..0000000 --- a/scripts/utilities/helpers/fix-alpine-cgroups-systemd.sh +++ /dev/null @@ -1,124 +0,0 @@ -#!/bin/bash -# Alternative script to fix cgroup configuration using systemd approach - -echo "🔧 Fixing cgroup configuration using systemd approach..." - -# Check if running as root -if [ "$(id -u)" -ne 0 ]; then - echo "❌ This script must be run as root" - exit 1 -fi - -# Update package lists -echo "📦 Updating package lists..." -apk update - -# Install necessary packages -echo "📦 Installing necessary packages..." -apk add systemd openrc - -# Create systemd cgroup configuration -echo "🔧 Creating systemd cgroup configuration..." -mkdir -p /etc/systemd -cat > /etc/systemd/system.conf << 'EOF' -[Manager] -DefaultControllers=cpu cpuacct memory devices freezer net_cls blkio cpuset perf_event pids -EOF - -# Create systemd cgroup mount configuration -echo "🔧 Creating systemd cgroup mount configuration..." -mkdir -p /etc/systemd/system -cat > /etc/systemd/system/sys-fs-cgroup.mount << 'EOF' -[Unit] -Description=Control Group Hierarchy -DefaultDependencies=no -Before=sysinit.target -ConditionPathExists=/sys/fs/cgroup - -[Mount] -What=cgroup -Where=/sys/fs/cgroup -Type=cgroup -Options=nosuid,noexec,nodev - -[Install] -WantedBy=sysinit.target -EOF - -# Create systemd service to set up cgroups -echo "🔧 Creating systemd service to set up cgroups..." -cat > /etc/systemd/system/setup-cgroups.service << 'EOF' -[Unit] -Description=Set up cgroups -After=sys-fs-cgroup.mount -DefaultDependencies=no - -[Service] -Type=oneshot -ExecStart=/bin/sh -c 'for subsystem in cpu cpuacct memory devices freezer net_cls blkio cpuset perf_event pids; do mkdir -p /sys/fs/cgroup/$subsystem; mount -t cgroup cgroup /sys/fs/cgroup/$subsystem; done' -RemainAfterExit=yes - -[Install] -WantedBy=sysinit.target -EOF - -# Enable systemd services -echo "🚀 Enabling systemd services..." -systemctl enable sys-fs-cgroup.mount -systemctl enable setup-cgroups.service - -# Create a script to manually set up cgroups -echo "🔧 Creating manual cgroup setup script..." -cat > /usr/local/bin/setup-cgroups-manual.sh << 'EOF' -#!/bin/bash -# Manual cgroup setup script - -# Mount cgroup filesystem if not already mounted -if ! mountpoint -q /sys/fs/cgroup; then - echo "Mounting cgroup filesystem..." - mount -t cgroup cgroup /sys/fs/cgroup -fi - -# Set up all cgroup subsystems -for subsystem in cpu cpuacct memory devices freezer net_cls blkio cpuset perf_event pids; do - if [ ! -d "/sys/fs/cgroup/$subsystem" ]; then - mkdir -p "/sys/fs/cgroup/$subsystem" - fi - if ! mountpoint -q "/sys/fs/cgroup/$subsystem"; then - echo "Mounting $subsystem subsystem..." - mount -t cgroup cgroup "/sys/fs/cgroup/$subsystem" - fi -done - -# Verify pids subsystem is available -if [ -d /sys/fs/cgroup/pids ]; then - echo "✅ PIDS cgroup subsystem is available" -else - echo "❌ PIDS cgroup subsystem is not available" -fi -EOF - -chmod +x /usr/local/bin/setup-cgroups-manual.sh - -# Create a script to start container with proper cgroup settings -echo "🔧 Creating container startup script..." -cat > /usr/local/bin/start-qdrant-container.sh << 'EOF' -#!/bin/bash -# Script to start Qdrant container with proper cgroup settings - -# Set up cgroups first -/usr/local/bin/setup-cgroups-manual.sh - -# Start the container -echo "Starting Qdrant container..." -podman run -p 6333:6333 \ - -v $(pwd)/data:/qdrant/storage \ - hub.git4ta.fun/qdrant/qdrant -EOF - -chmod +x /usr/local/bin/start-qdrant-container.sh - -echo "✅ Systemd cgroup configuration complete!" -echo "🔄 Please reboot the system to ensure all changes take effect" -echo "After reboot, you can use '/usr/local/bin/start-qdrant-container.sh' to start your container" -echo "Alternatively, you can run '/usr/local/bin/setup-cgroups-manual.sh' before starting the container manually" \ No newline at end of file diff --git a/scripts/utilities/helpers/fix-alpine-cgroups.sh b/scripts/utilities/helpers/fix-alpine-cgroups.sh deleted file mode 100755 index cfcf228..0000000 --- a/scripts/utilities/helpers/fix-alpine-cgroups.sh +++ /dev/null @@ -1,112 +0,0 @@ -#!/bin/bash -# Script to fix cgroup configuration for container runtime in Alpine Linux - -echo "🔧 Fixing cgroup configuration for container runtime..." - -# Check if running as root -if [ "$(id -u)" -ne 0 ]; then - echo "❌ This script must be run as root" - exit 1 -fi - -# Update package lists -echo "📦 Updating package lists..." -apk update - -# Install necessary packages for cgroup management -echo "📦 Installing cgroup-related packages..." -apk add cgroup-tools cgroupfs-mount - -# Create cgroup mount points -echo "🔧 Creating cgroup mount points..." -mkdir -p /sys/fs/cgroup/{cpu,cpuacct,memory,devices,freezer,net_cls,blkio,cpuset,perf_event,pids} - -# Mount cgroup filesystems -echo "🔧 Mounting cgroup filesystems..." -mount -t cgroup cgroup /sys/fs/cgroup -mount -t cgroup cgroup /sys/fs/cgroup/cpu -mount -t cgroup cgroup /sys/fs/cgroup/cpuacct -mount -t cgroup cgroup /sys/fs/cgroup/memory -mount -t cgroup cgroup /sys/fs/cgroup/devices -mount -t cgroup cgroup /sys/fs/cgroup/freezer -mount -t cgroup cgroup /sys/fs/cgroup/net_cls -mount -t cgroup cgroup /sys/fs/cgroup/blkio -mount -t cgroup cgroup /sys/fs/cgroup/cpuset -mount -t cgroup cgroup /sys/fs/cgroup/perf_event -mount -t cgroup cgroup /sys/fs/cgroup/pids - -# Add cgroup mounts to /etc/fstab for persistence -echo "💾 Adding cgroup mounts to /etc/fstab..." -cat >> /etc/fstab << EOF -# Cgroup mounts for container runtime -cgroup /sys/fs/cgroup cgroup defaults 0 0 -cgroup /sys/fs/cgroup/cpu cgroup defaults 0 0 -cgroup /sys/fs/cgroup/cpuacct cgroup defaults 0 0 -cgroup /sys/fs/cgroup/memory cgroup defaults 0 0 -cgroup /sys/fs/cgroup/devices cgroup defaults 0 0 -cgroup /sys/fs/cgroup/freezer cgroup defaults 0 0 -cgroup /sys/fs/cgroup/net_cls cgroup defaults 0 0 -cgroup /sys/fs/cgroup/blkio cgroup defaults 0 0 -cgroup /sys/fs/cgroup/cpuset cgroup defaults 0 0 -cgroup /sys/fs/cgroup/perf_event cgroup defaults 0 0 -cgroup /sys/fs/cgroup/pids cgroup defaults 0 0 -EOF - -# Enable and start cgroup service if available -if [ -f /etc/init.d/cgroups ]; then - echo "🚀 Enabling and starting cgroups service..." - rc-update add cgroups boot - rc-service cgroups start -fi - -# Create a script to set up cgroups on boot -echo "🔧 Creating cgroup setup script..." -cat > /usr/local/bin/setup-cgroups.sh << 'EOF' -#!/bin/bash -# Script to set up cgroups on boot - -# Mount cgroup filesystems if not already mounted -if ! mountpoint -q /sys/fs/cgroup; then - mount -t cgroup cgroup /sys/fs/cgroup -fi - -# Ensure all cgroup subsystems are mounted -for subsystem in cpu cpuacct memory devices freezer net_cls blkio cpuset perf_event pids; do - if [ ! -d "/sys/fs/cgroup/$subsystem" ]; then - mkdir -p "/sys/fs/cgroup/$subsystem" - fi - if ! mountpoint -q "/sys/fs/cgroup/$subsystem"; then - mount -t cgroup cgroup "/sys/fs/cgroup/$subsystem" - fi -done -EOF - -chmod +x /usr/local/bin/setup-cgroups.sh - -# Add the script to local.d to run on boot -echo "🚀 Adding cgroup setup script to boot sequence..." -mkdir -p /etc/local.d -echo "/usr/local/bin/setup-cgroups.sh" > /etc/local.d/cgroups.start -chmod +x /etc/local.d/cgroups.start - -# Enable local.d service -rc-update add local default - -# Verify cgroup setup -echo "✅ Verifying cgroup setup..." -if mountpoint -q /sys/fs/cgroup; then - echo "✅ Cgroup filesystem is mounted" -else - echo "❌ Cgroup filesystem is not mounted" -fi - -# Check if pids subsystem is available -if [ -d /sys/fs/cgroup/pids ]; then - echo "✅ PIDS cgroup subsystem is available" -else - echo "❌ PIDS cgroup subsystem is not available" -fi - -echo "🎉 Cgroup configuration complete!" -echo "🔄 Please reboot the system to ensure all changes take effect" -echo "After reboot, you should be able to run your container successfully." \ No newline at end of file diff --git a/scripts/utilities/helpers/manage-vault-consul.sh b/scripts/utilities/helpers/manage-vault-consul.sh deleted file mode 100755 index 562e22d..0000000 --- a/scripts/utilities/helpers/manage-vault-consul.sh +++ /dev/null @@ -1,196 +0,0 @@ -#!/bin/bash -# Vault与Consul集成管理脚本 - -# 颜色定义 -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -RED='\033[0;31m' -NC='\033[0m' # No Color - -# 函数定义 -log_info() { - echo -e "${GREEN}[INFO]${NC} $1" -} - -log_warn() { - echo -e "${YELLOW}[WARN]${NC} $1" -} - -log_error() { - echo -e "${RED}[ERROR]${NC} $1" -} - -# 显示帮助信息 -show_help() { - echo "用法: $0 [选项]" - echo "选项:" - echo " status 显示Vault和Consul状态" - echo " verify 验证集成状态" - echo " backup 备份Consul中的Vault数据" - echo " restore 从备份恢复Consul中的Vault数据" - echo " monitor 监控Vault和Consul运行状态" - echo " health 检查健康状态" - echo " help 显示此帮助信息" -} - -# 显示Vault和Consul状态 -show_status() { - log_info "Vault状态:" - source /root/mgmt/security/secrets/vault/dev/vault_env.sh - vault status - - echo "" - log_info "Consul成员状态:" - consul members - - echo "" - log_info "Consul中的Vault数据键数量:" - curl -s http://100.117.106.136:8500/v1/kv/vault/?keys | jq length -} - -# 验证集成状态 -verify_integration() { - /root/mgmt/deployment/scripts/verify_vault_consul_integration.sh -} - -# 备份Vault数据(存储在Consul中) -backup_vault_data() { - log_info "开始备份Consul中的Vault数据..." - - BACKUP_DIR="/root/mgmt/security/secrets/vault/backups" - TIMESTAMP=$(date +%Y%m%d_%H%M%S) - BACKUP_FILE="$BACKUP_DIR/vault_consul_backup_$TIMESTAMP.json" - - mkdir -p "$BACKUP_DIR" - - # 获取所有Vault相关的键 - keys=$(curl -s http://100.117.106.136:8500/v1/kv/vault/?recurse | jq -r '.[].Key') - - if [ -n "$keys" ]; then - # 创建备份数据结构 - echo '{"backup_timestamp": "'$(date -Iseconds)'", "vault_data": []}' > "$BACKUP_FILE" - - # 备份每个键的值 - while IFS= read -r key; do - value=$(curl -s http://100.117.106.136:8500/v1/kv/$key | jq -r '.[0].Value' | base64 -d | base64) - jq --arg key "$key" --arg value "$value" '.vault_data += [{"key": $key, "value": $value}]' "$BACKUP_FILE" > "$BACKUP_FILE.tmp" && mv "$BACKUP_FILE.tmp" "$BACKUP_FILE" - done <<< "$keys" - - log_info "✓ Vault数据已备份到: $BACKUP_FILE" - log_warn "注意:这是未加密的备份,请确保安全存储" - else - log_error "✗ 无法获取Consul中的Vault数据" - fi -} - -# 远程管理功能演示 -remote_management_demo() { - echo_section "HashiCorp 产品远程管理能力演示" - - log_info "1. Consul 远程管理演示" - - # 查看 Consul 集群成员 - log_info "查看 Consul 集群成员:" - consul members || log_warn "无法获取集群成员信息" - - # 查看 Consul 数据中心信息 - log_info "查看 Consul 数据中心信息:" - consul info | grep -E "(datacenter|server|client)" || log_warn "无法获取数据中心信息" - - # 在 Consul 中存储和读取键值 - log_info "在 Consul 中存储测试键值:" - echo "测试值" | consul kv put demo/test/value - - log_info "从 Consul 读取测试键值:" - consul kv get demo/test/value || log_warn "无法读取键值" - - log_info "2. Vault 远程管理演示" - - # 检查 Vault 状态 - log_info "检查 Vault 状态:" - vault status || log_warn "无法连接到 Vault 或 Vault 未初始化" - - # 列出 Vault 密钥引擎 - log_info "列出 Vault 密钥引擎:" - vault secrets list || log_warn "无法列出密钥引擎" - - # 在 Vault 中写入和读取密钥 - log_info "在 Vault 中存储测试密钥:" - echo "测试数据" | vault kv put secret/demo/test value=- - log_info "从 Vault 读取测试密钥:" - vault kv get secret/demo/test || log_warn "无法读取密钥" - - # 查看 Vault 集群信息 - log_info "查看 Vault 集群信息:" - vault operator raft list-peers || log_warn "无法列出 Raft 集群节点" - - log_info "远程管理功能演示完成" - log_info "请根据实际环境配置正确的地址和认证凭据" -} - -# 健康检查 -health_check() { - log_info "执行健康检查..." - - # Vault健康检查 - vault_health=$(curl -s http://100.117.106.136:8200/v1/sys/health) - if echo "$vault_health" | grep -q '"initialized":true'; then - log_info "✓ Vault已初始化" - else - log_error "✗ Vault未初始化" - fi - - if echo "$vault_health" | grep -q '"sealed":false'; then - log_info "✓ Vault未密封" - else - log_error "✗ Vault已密封" - fi - - # Consul健康检查 - consul_health=$(curl -s http://100.117.106.136:8500/v1/status/leader) - if [ -n "$consul_health" ] && [ "$consul_health" != "null" ]; then - log_info "✓ Consul集群有领导者" - else - log_error "✗ Consul集群无领导者" - fi - - # 检查Vault数据 - vault_data_check=$(curl -s http://100.117.106.136:8500/v1/kv/vault/core/seal-config 2>/dev/null | jq length 2>/dev/null) - if [ -n "$vault_data_check" ] && [ "$vault_data_check" -gt 0 ]; then - log_info "✓ Vault核心数据存在" - else - log_error "✗ Vault核心数据缺失" - fi - - log_info "健康检查完成" -} - -# 主程序 -case "$1" in - status) - show_status - ;; - verify) - verify_integration - ;; - backup) - backup_vault_data - ;; - monitor) - monitor_status - ;; - health) - health_check - ;; - help|--help|-h) - show_help - ;; - *) - if [ -z "$1" ]; then - show_help - else - log_error "未知选项: $1" - show_help - exit 1 - fi - ;; -esac \ No newline at end of file diff --git a/scripts/utilities/helpers/nomad-leader-discovery.sh b/scripts/utilities/helpers/nomad-leader-discovery.sh deleted file mode 100755 index e2177c7..0000000 --- a/scripts/utilities/helpers/nomad-leader-discovery.sh +++ /dev/null @@ -1,193 +0,0 @@ -#!/bin/bash - -# Nomad 集群领导者发现与访问脚本 -# 此脚本自动发现当前 Nomad 集群领导者并执行相应命令 - -# 默认服务器列表(可根据实际情况修改) -SERVERS=( - "100.116.158.95" # bj-semaphore - "100.81.26.3" # ash1d - "100.103.147.94" # ash2e - "100.90.159.68" # ch2 - "100.86.141.112" # ch3 - "100.98.209.50" # bj-onecloud1 - "100.120.225.29" # de -) - -# 超时设置(秒) -TIMEOUT=5 - -# 颜色输出 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -NC='\033[0m' # No Color - -# 打印帮助信息 -function show_help() { - echo "Nomad 集群领导者发现与访问脚本" - echo "" - echo "用法: $0 [选项] [nomad命令]" - echo "" - echo "选项:" - echo " -h, --help 显示此帮助信息" - echo " -s, --server IP 指定初始服务器IP" - echo " -t, --timeout SECS 设置超时时间(默认: $TIMEOUT 秒)" - echo " -l, --list-servers 列出所有配置的服务器" - echo " -c, --check-leader 仅检查领导者,不执行命令" - echo "" - echo "示例:" - echo " $0 node status # 使用自动发现的领导者查看节点状态" - echo " $0 -s 100.116.158.95 job status # 指定初始服务器查看作业状态" - echo " $0 -c # 仅检查当前领导者" - echo "" -} - -# 列出所有配置的服务器 -function list_servers() { - echo -e "${YELLOW}配置的服务器列表:${NC}" - for server in "${SERVERS[@]}"; do - echo " - $server" - done -} - -# 发现领导者 -function discover_leader() { - local initial_server=$1 - - # 如果指定了初始服务器,先尝试使用它 - if [ -n "$initial_server" ]; then - echo -e "${YELLOW}尝试从服务器 $initial_server 发现领导者...${NC}" >&2 - leader=$(curl -s --max-time $TIMEOUT "http://${initial_server}:4646/v1/status/leader" 2>/dev/null | sed 's/"//g') - if [ -n "$leader" ] && [ "$leader" != "" ]; then - # 将RPC端口(4647)替换为HTTP端口(4646) - leader=$(echo "$leader" | sed 's/:4647$/:4646/') - echo -e "${GREEN}发现领导者: $leader${NC}" >&2 - echo "$leader" - return 0 - fi - echo -e "${RED}无法从 $initial_server 获取领导者信息${NC}" >&2 - fi - - # 遍历所有服务器尝试发现领导者 - echo -e "${YELLOW}遍历所有服务器寻找领导者...${NC}" >&2 - for server in "${SERVERS[@]}"; do - echo -n " 检查 $server ... " >&2 - leader=$(curl -s --max-time $TIMEOUT "http://${server}:4646/v1/status/leader" 2>/dev/null | sed 's/"//g') - if [ -n "$leader" ] && [ "$leader" != "" ]; then - # 将RPC端口(4647)替换为HTTP端口(4646) - leader=$(echo "$leader" | sed 's/:4647$/:4646/') - echo -e "${GREEN}成功${NC}" >&2 - echo -e "${GREEN}发现领导者: $leader${NC}" >&2 - echo "$leader" - return 0 - else - echo -e "${RED}失败${NC}" >&2 - fi - done - - echo -e "${RED}无法发现领导者,请检查集群状态${NC}" >&2 - return 1 -} - -# 解析命令行参数 -INITIAL_SERVER="" -CHECK_LEADER_ONLY=false -NOMAD_COMMAND=() - -while [[ $# -gt 0 ]]; do - case $1 in - -h|--help) - show_help - exit 0 - ;; - -s|--server) - INITIAL_SERVER="$2" - shift 2 - ;; - -t|--timeout) - TIMEOUT="$2" - shift 2 - ;; - -l|--list-servers) - list_servers - exit 0 - ;; - -c|--check-leader) - CHECK_LEADER_ONLY=true - shift - ;; - *) - NOMAD_COMMAND+=("$1") - shift - ;; - esac -done - -# 主逻辑 -echo -e "${YELLOW}Nomad 集群领导者发现与访问脚本${NC}" >&2 -echo "==================================" >&2 - -# 发现领导者 -LEADER=$(discover_leader "$INITIAL_SERVER") -if [ $? -ne 0 ]; then - exit 1 -fi - -# 提取领导者IP和端口 -LEADER_IP=$(echo "$LEADER" | cut -d':' -f1) -LEADER_PORT=$(echo "$LEADER" | cut -d':' -f2) - -# 如果仅检查领导者,则退出 -if [ "$CHECK_LEADER_ONLY" = true ]; then - echo -e "${GREEN}当前领导者: $LEADER${NC}" >&2 - exit 0 -fi - -# 如果没有指定命令,显示交互式菜单 -if [ ${#NOMAD_COMMAND[@]} -eq 0 ]; then - echo -e "${YELLOW}未指定命令,请选择要执行的操作:${NC}" >&2 - echo "1) 查看节点状态" >&2 - echo "2) 查看作业状态" >&2 - echo "3) 查看服务器成员" >&2 - echo "4) 查看集群状态" >&2 - echo "5) 自定义命令" >&2 - echo "0) 退出" >&2 - - read -p "请输入选项 (0-5): " choice - - case $choice in - 1) NOMAD_COMMAND=("node" "status") ;; - 2) NOMAD_COMMAND=("job" "status") ;; - 3) NOMAD_COMMAND=("server" "members") ;; - 4) NOMAD_COMMAND=("operator" "raft" "list-peers") ;; - 5) - read -p "请输入完整的 Nomad 命令: " -a NOMAD_COMMAND - ;; - 0) exit 0 ;; - *) - echo -e "${RED}无效选项${NC}" >&2 - exit 1 - ;; - esac -fi - -# 执行命令 -echo -e "${YELLOW}执行命令: nomad ${NOMAD_COMMAND[*]} -address=http://${LEADER}${NC}" >&2 -nomad "${NOMAD_COMMAND[@]}" -address="http://${LEADER}" - -# 检查命令执行结果 -if [ $? -eq 0 ]; then - echo -e "${GREEN}命令执行成功${NC}" >&2 -else - echo -e "${RED}命令执行失败,可能需要重新发现领导者${NC}" >&2 - echo -e "${YELLOW}尝试重新发现领导者...${NC}" >&2 - NEW_LEADER=$(discover_leader) - if [ $? -eq 0 ] && [ "$NEW_LEADER" != "$LEADER" ]; then - echo -e "${YELLOW}领导者已更改,重新执行命令...${NC}" >&2 - nomad "${NOMAD_COMMAND[@]}" -address="http://${NEW_LEADER}" - else - echo -e "${RED}无法恢复,请检查集群状态${NC}" >&2 - exit 1 - fi -fi \ No newline at end of file diff --git a/scripts/utilities/helpers/show-vault-dev-keys.sh b/scripts/utilities/helpers/show-vault-dev-keys.sh deleted file mode 100755 index 84b0c76..0000000 --- a/scripts/utilities/helpers/show-vault-dev-keys.sh +++ /dev/null @@ -1,32 +0,0 @@ -#!/bin/bash -# 显示开发环境Vault密钥信息 - -echo "===== Vault开发环境密钥信息 =====" - -# 检查密钥文件是否存在 -if [ ! -f "/root/mgmt/security/secrets/vault/dev/init_keys.json" ]; then - echo "错误:Vault密钥文件不存在" - echo "请先运行初始化脚本:/root/mgmt/deployment/scripts/init_vault_dev.sh" - exit 1 -fi - -# 显示密钥信息 -echo "Vault开发环境密钥信息:" -echo "----------------------------------------" - -# 提取并显示解封密钥 -unseal_key=$(cat /root/mgmt/security/secrets/vault/dev/init_keys.json | grep -o '"unseal_keys_b64":\["[^"]*"' | cut -d '"' -f4) -echo "解封密钥: $unseal_key" - -# 提取并显示根令牌 -root_token=$(cat /root/mgmt/security/secrets/vault/dev/init_keys.json | grep -o '"root_token":"[^"]*"' | cut -d '"' -f4) -echo "根令牌: $root_token" - -echo "----------------------------------------" -echo "环境变量设置命令:" -echo "export VAULT_ADDR='http://100.117.106.136:8200'" -echo "export VAULT_TOKEN='$root_token'" - -echo "" -echo "注意:这是开发环境配置,仅用于测试目的" -echo "生产环境请遵循安全策略文档中的建议" \ No newline at end of file diff --git a/scripts/utilities/maintenance/cleanup-global-config.sh b/scripts/utilities/maintenance/cleanup-global-config.sh deleted file mode 100755 index bc18d50..0000000 --- a/scripts/utilities/maintenance/cleanup-global-config.sh +++ /dev/null @@ -1,170 +0,0 @@ -#!/bin/bash - -# Nomad Global 配置清理脚本 -# 此脚本用于移除配置文件中的 .global 后缀 - -set -e - -# 颜色输出 -RED='\033[0;31m' -GREEN='\033[0;32m' -YELLOW='\033[1;33m' -NC='\033[0m' # No Color - -# 日志函数 -log() { - echo -e "${GREEN}[$(date '+%Y-%m-%d %H:%M:%S')]${NC} $1" -} - -warn() { - echo -e "${YELLOW}[$(date '+%Y-%m-%d %H:%M:%S')]${NC} $1" -} - -error() { - echo -e "${RED}[$(date '+%Y-%m-%d %H:%M:%S')]${NC} $1" -} - -# 备份文件函数 -backup_file() { - local file=$1 - if [ -f "$file" ]; then - cp "$file" "${file}.backup.$(date +%Y%m%d_%H%M%S)" - log "已备份文件: $file" - fi -} - -# 清理 Ansible 配置文件中的 .global 后缀 -cleanup_ansible_configs() { - log "开始清理 Ansible 配置文件..." - - # 处理 configure-nomad-clients.yml - local client_config="/root/mgmt/deployment/ansible/playbooks/configure-nomad-clients.yml" - if [ -f "$client_config" ]; then - backup_file "$client_config" - sed -i 's/\.global//g' "$client_config" - log "已清理 configure-nomad-clients.yml" - fi - - # 处理 deploy-korean-nodes.yml - local korean_config="/root/mgmt/deployment/ansible/playbooks/deploy-korean-nodes.yml" - if [ -f "$korean_config" ]; then - backup_file "$korean_config" - sed -i 's/\.global//g' "$korean_config" - log "已清理 deploy-korean-nodes.yml" - fi - - # 处理 update_ch2_nomad_name*.yml - for file in /root/mgmt/deployment/ansible/update_ch2_nomad_name*.yml; do - if [ -f "$file" ]; then - backup_file "$file" - sed -i 's/name = "ch2\.global\.global"/name = "ch2"/g' "$file" - sed -i 's/hosts: ch2\.global/hosts: ch2/g' "$file" - log "已清理 $file" - fi - done - - # 处理其他包含 .global 的 Ansible 文件 - find /root/mgmt/deployment/ansible -name "*.yml" -o -name "*.yaml" | while read file; do - if grep -q "\.global" "$file"; then - backup_file "$file" - sed -i 's/\.global//g' "$file" - log "已清理 $file" - fi - done -} - -# 清理 inventory 文件中的 .global 后缀 -cleanup_inventory_files() { - log "开始清理 inventory 文件..." - - # 处理所有 inventory 文件 - find /root/mgmt/deployment/ansible/inventories -name "*.ini" | while read file; do - if grep -q "\.global" "$file"; then - backup_file "$file" - sed -i 's/\.global//g' "$file" - log "已清理 inventory 文件: $file" - fi - done -} - -# 清理脚本文件中的 .global 后缀 -cleanup_script_files() { - log "开始清理脚本文件..." - - # 处理 nomad-leader-discovery.sh - local script_file="/root/mgmt/deployment/scripts/nomad-leader-discovery.sh" - if [ -f "$script_file" ]; then - backup_file "$script_file" - sed -i 's/\.global//g' "$script_file" - log "已清理 nomad-leader-discovery.sh" - fi -} - -# 更新 Nomad 配置模板中的 region 设置 -update_nomad_templates() { - log "开始更新 Nomad 配置模板..." - - # 处理 OpenTofu 模板 - local template_file="/root/mgmt/infrastructure/opentofu/modules/nomad-cluster/templates/nomad-userdata.sh" - if [ -f "$template_file" ]; then - backup_file "$template_file" - sed -i 's/region = "dc1"/region = "dc1"/g' "$template_file" - log "已更新 Nomad 配置模板中的 region 设置" - fi - - # 处理其他可能的模板文件 - find /root/mgmt -name "*.hcl" -o -name "*.sh" | while read file; do - if grep -q 'region = "dc1"' "$file"; then - backup_file "$file" - sed -i 's/region = "dc1"/region = "dc1"/g' "$file" - log "已更新 $file 中的 region 设置" - fi - done -} - -# 验证修改结果 -verify_changes() { - log "验证修改结果..." - - # 检查是否还有 .global 后缀 - local global_count=$(grep -r "\.global" /root/mgmt --include="*.yml" --include="*.yaml" --include="*.ini" --include="*.sh" --include="*.hcl" | grep -v cleanup-global-config.sh | wc -l) - if [ "$global_count" -eq 0 ]; then - log "✅ 所有 .global 后缀已成功移除" - else - warn "仍有一些文件包含 .global 后缀,请手动检查" - grep -r "\.global" /root/mgmt --include="*.yml" --include="*.yaml" --include="*.ini" --include="*.sh" --include="*.hcl" | grep -v cleanup-global-config.sh || true - fi - - # 检查 region 设置 - local region = "dc1"' /root/mgmt --include="*.hcl" --include="*.sh" | grep -v cleanup-global-config.sh | wc -l) - if [ "$region_count" -eq 0 ]; then - log "✅ 所有 region 'global' 设置已更新" - else - warn "仍有一些 region 设置为 'global',请手动检查" - grep -r 'region = "dc1"' /root/mgmt --include="*.hcl" --include="*.sh" | grep -v cleanup-global-config.sh || true - fi -} - -# 主函数 -main() { - log "开始执行 Nomad Global 配置清理..." - - # 创建备份目录 - mkdir -p /root/mgmt/backups/global_cleanup - log "已创建备份目录: /root/mgmt/backups/global_cleanup" - - # 执行清理操作 - cleanup_ansible_configs - cleanup_inventory_files - cleanup_script_files - update_nomad_templates - - # 验证修改结果 - verify_changes - - log "Nomad Global 配置清理完成!" - log "请检查备份文件并重新部署相关配置" -} - -# 执行主函数 -main "$@" \ No newline at end of file diff --git a/templates/nomad-client.j2 b/templates/nomad-client.j2 new file mode 100644 index 0000000..6a668ad --- /dev/null +++ b/templates/nomad-client.j2 @@ -0,0 +1,81 @@ +datacenter = "dc1" +data_dir = "/opt/nomad/data" +plugin_dir = "/opt/nomad/plugins" +log_level = "INFO" +name = "{{ inventory_hostname }}" + +bind_addr = "{{ inventory_hostname }}.tailnet-68f9.ts.net" + +addresses { + http = "{{ inventory_hostname }}.tailnet-68f9.ts.net" + rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net" + serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net" +} + +advertise { + http = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4646" + rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4647" + serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4648" +} + +ports { + http = 4646 + rpc = 4647 + serf = 4648 +} + +server { + enabled = false +} + +client { + enabled = true + network_interface = "tailscale0" + + servers = [ + "semaphore.tailnet-68f9.ts.net:4647", + "ash1d.tailnet-68f9.ts.net:4647", + "ash2e.tailnet-68f9.ts.net:4647", + "ch2.tailnet-68f9.ts.net:4647", + "ch3.tailnet-68f9.ts.net:4647", + "onecloud1.tailnet-68f9.ts.net:4647", + "de.tailnet-68f9.ts.net:4647" + ] + + meta { + consul = "true" + consul_version = "1.21.5" + consul_server = "{% if inventory_hostname in ['master', 'ash3c', 'warden'] %}true{% else %}false{% endif %}" + } + + # 激进的垃圾清理策略 + gc_interval = "5m" + gc_disk_usage_threshold = 80 + gc_inode_usage_threshold = 70 +} + +plugin "nomad-driver-podman" { + config { + socket_path = "unix:///run/podman/podman.sock" + volumes { + enabled = true + } + } +} + +consul { + address = "warden.tailnet-68f9.ts.net:8500" + server_service_name = "nomad" + client_service_name = "nomad-client" + auto_advertise = true + server_auto_join = true + client_auto_join = true +} + +vault { + enabled = true + address = "http://warden.tailnet-68f9.ts.net:8200" + token = "hvs.A5Fu4E1oHyezJapVllKPFsWg" + create_from_role = "nomad-cluster" + tls_skip_verify = true +} diff --git a/test-podman-job.nomad b/test-podman-job.nomad deleted file mode 100644 index a49c5a9..0000000 --- a/test-podman-job.nomad +++ /dev/null @@ -1,28 +0,0 @@ -job "test-podman-job" { - datacenters = ["dc1"] - type = "batch" - - constraint { - attribute = "${node.class}" - value = "" - } - - group "test-podman-group" { - count = 1 - - task "test-podman-task" { - driver = "podman" - - config { - image = "alpine:latest" - command = "echo" - args = ["Hello from Podman on Nomad client!"] - } - - resources { - cpu = 100 # MHz - memory = 64 # MB - } - } - } -} \ No newline at end of file diff --git a/tests/README.md b/tests/README.md deleted file mode 100644 index b8da143..0000000 --- a/tests/README.md +++ /dev/null @@ -1,88 +0,0 @@ -# 测试脚本目录 - -本目录包含了项目的所有测试脚本,按照功能进行了分类组织。 - -## 目录结构 - -``` -tests/ -├── mcp_servers/ # MCP服务器相关测试脚本 -│ ├── test_direct_search.sh -│ ├── test_local_mcp_servers.sh -│ ├── test_mcp_interface.sh -│ ├── test_mcp_servers.sh -│ ├── test_mcp_servers_comprehensive.py -│ ├── test_mcp_servers_improved.py -│ ├── test_mcp_servers_simple.py -│ ├── test_qdrant_ollama_server.py -│ ├── test_qdrant_ollama_tools.sh -│ ├── test_qdrant_ollama_tools_fixed.sh -│ ├── test_search_documents.sh -│ └── test_mcp_search_final.sh -├── mcp_server_test_report.md # MCP服务器测试报告 -├── run_all_tests.sh # 自动化测试运行器 -└── legacy/ # 旧的或不再使用的测试脚本 -``` - -## MCP服务器测试脚本说明 - -### Shell脚本 -- `test_direct_search.sh`: 测试search_documents方法,通过SSH执行Python代码 -- `test_local_mcp_servers.sh`: 检查MCP配置,测试服务器可用性(context7, qdrant, qdrant-ollama),验证环境变量 -- `test_mcp_interface.sh`: 通过实际接口测试MCP服务器调用,包括tools/list和qdrant_search方法 -- `test_mcp_servers.sh`: 通过initialize方法调用测试Qdrant和Qdrant-Ollama MCP服务器 -- `test_search_documents.sh`: 添加测试文档并搜索"人工智能"(artificial intelligence) -- `test_qdrant_ollama_tools.sh`: 通过JSON-RPC调用测试search_documents和add_document工具 -- `test_qdrant_ollama_tools_fixed.sh`: 测试search_documents、add_document和list_collections工具 -- `test_mcp_search_final.sh`: 最终版本的MCP搜索测试脚本 - -### Python脚本 -- `test_qdrant_ollama_server.py`: 启动服务器,测试初始化、工具列表、文档添加和搜索功能 -- `test_mcp_servers_comprehensive.py`: 使用asyncio和增强响应处理综合测试MCP服务器 -- `test_mcp_servers_improved.py`: 改进版的MCP服务器测试,使用asyncio和增强响应处理 -- `test_mcp_servers_simple.py`: 简化版MCP服务器测试,使用同步子进程调用 - -## 使用方法 - -### 运行单个测试脚本 -```bash -cd tests/mcp_servers -./test_local_mcp_servers.sh -``` - -或运行Python测试: -```bash -cd tests/mcp_servers -python test_mcp_servers_simple.py -``` - -### 批量运行所有测试 -使用自动化测试运行器脚本,可以一键运行所有测试并生成详细报告: -```bash -cd tests -./run_all_tests.sh -``` - -自动化测试运行器将: -- 自动运行所有Shell和Python测试脚本 -- 彩色输出测试进度和结果 -- 生成详细的测试报告(Markdown格式) -- 统计测试通过率和失败情况 -- 保存测试日志到文件 - -## 注意事项 - -- 所有测试脚本都依赖于正确的环境变量配置 -- 测试前请确保相关服务(context7, qdrant, qdrant-ollama)已启动 -- 某些测试可能需要SSH访问权限 - -## 测试报告 - -`mcp_server_test_report.md` 文件包含了MCP服务器的详细测试结果,包括: -- context7、qdrant和qdrant-ollama三个服务器的测试状态 -- 测试环境和方法说明 -- 发现的问题和解决方案 -- 环境变量配置详情 -- 建议和后续改进方向 - -建议在运行测试脚本前先阅读测试报告,了解当前的测试状态和已知问题。 \ No newline at end of file diff --git a/tests/mcp_server_test_report.md b/tests/mcp_server_test_report.md deleted file mode 100644 index 5d1e1b2..0000000 --- a/tests/mcp_server_test_report.md +++ /dev/null @@ -1,55 +0,0 @@ -# MCP服务器测试报告 - -## 测试概述 -本报告记录了对context7、qdrant和qdrant-ollama三个MCP服务器的测试结果。 - -## 测试环境 -- 测试时间:2025-06-17 -- 测试方法:通过SSH连接到远程服务器进行测试 -- 测试工具:JSON-RPC协议直接调用MCP服务器 - -## 测试结果 - -### 1. context7服务器 -- **状态**:✅ 正常工作 -- **测试内容**: - - 成功初始化 - - 成功获取工具列表 - - 成功执行搜索功能 -- **备注**:context7服务器运行稳定,所有功能正常 - -### 2. qdrant-ollama服务器 -- **状态**:✅ 正常工作(已修复filter参数问题) -- **测试内容**: - - 成功获取工具列表:add_document、search_documents、list_collections和get_collection_info - - 成功使用add_document工具添加文档 - - 成功使用search_documents工具搜索文档 -- **修复记录**: - - **问题**:search_documents工具使用filter参数时出现"Unknown arguments: ['filter']"错误 - - **原因**:参数名称不匹配,工具定义中使用filter,但实现中使用query_filter - - **解决方案**:将工具定义中的filter参数名改为query_filter - - **验证结果**:修复后search_documents工具正常工作,不再出现错误 - -### 3. qdrant服务器 -- **状态**:✅ 正常工作 -- **测试内容**: - - 成功获取工具列表:qdrant_search、qdrant_add和qdrant_delete - - 成功使用qdrant_add工具添加文档 - - 成功使用qdrant_search工具搜索文档 -- **备注**:qdrant服务器运行稳定,所有功能正常 - -## 环境变量配置 -两个服务器都正确配置了以下环境变量: -- QDRANT_URL: http://dev1:6333 (qdrant-ollama) / http://localhost:6333 (qdrant) -- QDRANT_API_KEY: 313131 -- OLLAMA_URL: http://dev1:11434 (仅qdrant-ollama) -- OLLAMA_MODEL: nomic-embed-text (仅qdrant-ollama) -- COLLECTION_NAME: ollama_mcp (qdrant-ollama) / mcp (qdrant) - -## 结论 -所有三个MCP服务器均已成功测试并正常工作。qdrant-ollama服务器的filter参数问题已修复,不再出现"Unknown arguments: ['filter']"错误。所有服务器的核心功能(添加文档、搜索文档)均正常运行。 - -## 建议 -1. 考虑将qdrant_mcp_server.py中的search方法更新为query_points方法,以消除弃用警告 -2. 可以考虑为qdrant-ollama服务器添加更多过滤选项,增强搜索功能 -3. 建议定期测试MCP服务器的功能,确保持续稳定运行 \ No newline at end of file