REMOVE: 删除不再使用的 Terraform 配置文件
Deploy Nomad Configurations / deploy-nomad (push) Failing after 7m45s
Details
Infrastructure CI/CD / Validate Infrastructure (push) Failing after 2m33s
Details
Infrastructure CI/CD / Plan Infrastructure (push) Has been skipped
Details
Infrastructure CI/CD / Apply Infrastructure (push) Has been skipped
Details
Simple Test / test (push) Failing after 2m48s
Details
Deploy Nomad Configurations / deploy-nomad (push) Failing after 7m45s
Details
Infrastructure CI/CD / Validate Infrastructure (push) Failing after 2m33s
Details
Infrastructure CI/CD / Plan Infrastructure (push) Has been skipped
Details
Infrastructure CI/CD / Apply Infrastructure (push) Has been skipped
Details
Simple Test / test (push) Failing after 2m48s
Details
- 移除 nomad-terraform.tf 和 test_opentofu_consul.tf 文件 - 更新 Ansible inventory,注释掉不存在的节点 hcp2 - 修改 inventory.ini,确保节点配置的准确性 - 在 nomad-config 模块中添加 null_provider 以支持新配置 - 更新 influxdb1.hcl,添加 Grafana 和 Prometheus 数据卷配置
This commit is contained in:
parent
45f93cc68c
commit
eff8d3ec6d
|
|
@ -0,0 +1,344 @@
|
|||
# 🎬 Nomad 集群管理交接仪式
|
||||
|
||||
## 📋 交接概述
|
||||
|
||||
**交接时间**: 2025-10-09 12:15 UTC
|
||||
**交接原因**: 当前 AI 助手在 Nomad 集群管理上遇到困难,需要新的 AI 助手接手
|
||||
**交接目标**: 恢复 Nomad 集群稳定运行,实现真正的 GitOps 自动化流程
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ 当前系统架构
|
||||
|
||||
### **核心组件**
|
||||
```
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Gitea Repo │───▶│ Gitea Actions │───▶│ Ansible Deploy │
|
||||
│ (mgmt.git) │ │ (Workflows) │ │ (Playbooks) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
|
||||
│ Nomad Configs │ │ Webhook API │ │ Nomad Cluster │
|
||||
│ (nomad-configs/) │ │ (Trigger) │ │ (7+ nodes) │
|
||||
└─────────────────┘ └─────────────────┘ └─────────────────┘
|
||||
```
|
||||
|
||||
### **节点分布**
|
||||
- **服务器节点**: ash3c, ch4, warden (Consul 服务器)
|
||||
- **客户端节点**: ash2e, hcp1, influxdb, ash3c, ch4, warden, browser
|
||||
- **网络**: Tailscale 私有网络 (tailnet-68f9.ts.net)
|
||||
|
||||
### **关键目录结构**
|
||||
```
|
||||
/root/mgmt/
|
||||
├── .gitea/workflows/ # Gitea Actions 工作流 (❌ 未启用)
|
||||
│ ├── deploy-nomad.yml # Nomad 部署工作流
|
||||
│ └── ansible-deploy.yml # Ansible 部署工作流
|
||||
├── ansible/ # Ansible 配置和剧本
|
||||
│ ├── inventory/hosts.yml # 当前只有 warden 节点
|
||||
│ ├── ansible.cfg # Ansible 全局配置
|
||||
│ └── fix-warden-zsh.yml # 修复 warden zsh 配置的剧本
|
||||
├── nomad-configs/ # Nomad 配置文件
|
||||
│ ├── nodes/ # 各节点配置文件
|
||||
│ │ ├── warden.hcl # ✅ 成功模板 (基准配置)
|
||||
│ │ ├── hcp1.hcl # ❌ 需要修复
|
||||
│ │ ├── onecloud1.hcl # ❌ 节点已离开
|
||||
│ │ ├── influxdb1.hcl # 状态待确认
|
||||
│ │ ├── ash3c.hcl # 状态待确认
|
||||
│ │ ├── ch4.hcl # 状态待确认
|
||||
│ │ └── browser.hcl # 状态待确认
|
||||
│ ├── servers/ # 服务器节点配置
|
||||
│ ├── templates/ # 配置模板
|
||||
│ │ └── nomad-client.hcl.j2
|
||||
│ └── scripts/deploy.sh # 部署脚本
|
||||
├── nomad-jobs/ # Nomad 作业定义
|
||||
│ ├── consul-cluster-nomad # ❌ pending 状态
|
||||
│ ├── vault-cluster-ha.nomad # ❌ pending 状态
|
||||
│ └── traefik-cloudflare-v3 # ❌ pending 状态
|
||||
├── infrastructure/ # 基础设施代码
|
||||
├── components/ # 组件配置
|
||||
├── deployment/ # 部署相关
|
||||
├── security/ # 安全配置
|
||||
└── scripts/ # 各种脚本
|
||||
├── fix-nomad-nodes.sh # 修复 Nomad 节点脚本
|
||||
└── webhook-deploy.sh # Webhook 部署脚本
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 系统目标
|
||||
|
||||
### **主要目标**
|
||||
1. **高可用 Nomad 集群**: 7+ 节点稳定运行
|
||||
2. **GitOps 自动化**: 代码推送 → 自动部署
|
||||
3. **服务编排**: Consul + Vault + Traefik 完整栈
|
||||
4. **配置一致性**: 所有节点配置统一管理
|
||||
|
||||
### **服务栈目标**
|
||||
```
|
||||
Consul Cluster (服务发现)
|
||||
↓
|
||||
Nomad Cluster (作业编排)
|
||||
↓
|
||||
Vault Cluster (密钥管理)
|
||||
↓
|
||||
Traefik (负载均衡)
|
||||
↓
|
||||
应用服务 (通过 Nomad 部署)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🚨 当前问题分析
|
||||
|
||||
### **核心问题**
|
||||
1. **❌ Gitea Actions 未启用**: `has_actions: false`
|
||||
- 导致 GitOps 流程失效
|
||||
- 工作流文件存在但不执行
|
||||
- 需要手动触发部署
|
||||
|
||||
2. **❌ Nomad 节点不稳定**: 部分节点频繁 down
|
||||
- ash1d: 一直 down
|
||||
- onecloud1: left 集群
|
||||
- 节点间连接问题
|
||||
|
||||
3. **❌ 服务部署失败**: 所有服务都 pending
|
||||
- consul-cluster-nomad: pending
|
||||
- vault-cluster-ha: pending
|
||||
- traefik-cloudflare-v3: pending
|
||||
|
||||
### **具体错误**
|
||||
```bash
|
||||
# Nomad 节点状态
|
||||
ID Node Pool DC Name Status
|
||||
8ec41212 default dc1 ash2e ready
|
||||
217d02f1 default dc1 ash1d down # ❌ 问题节点
|
||||
f99725f8 default dc1 hcp1 ready
|
||||
7610e8cb default dc1 influxdb ready
|
||||
6d1e03b2 default dc1 ash3c ready
|
||||
304efba0 default dc1 ch4 ready
|
||||
22da3f32 default dc1 warden ready
|
||||
c9c32568 default dc1 browser ready
|
||||
|
||||
# Consul 成员状态
|
||||
Node Address Status
|
||||
ash3c 100.116.80.94:8301 alive
|
||||
ch4 100.117.106.136:8301 alive
|
||||
warden 100.122.197.112:8301 alive
|
||||
onecloud1 100.98.209.50:8301 left # ❌ 已离开
|
||||
ash1d 100.81.26.3:8301 left # ❌ 已离开
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔧 解决方案建议
|
||||
|
||||
### **优先级 1: 启用 Gitea Actions**
|
||||
```bash
|
||||
# 检查 Gitea 全局 Actions 设置
|
||||
curl -s "http://gitea.tailnet-68f9.ts.net/api/v1/admin/config" | jq '.actions'
|
||||
|
||||
# 启用仓库 Actions
|
||||
curl -X PATCH "http://gitea.tailnet-68f9.ts.net/api/v1/repos/ben/mgmt" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d '{"has_actions": true}'
|
||||
```
|
||||
|
||||
### **优先级 2: 扩展 Ansible Inventory**
|
||||
```bash
|
||||
# 当前 inventory 只有 warden 节点,需要添加所有节点
|
||||
# 编辑 ansible/inventory/hosts.yml 添加所有节点信息
|
||||
|
||||
# 参考当前配置格式:
|
||||
# warden:
|
||||
# ansible_host: 100.122.197.112
|
||||
# ansible_user: ben
|
||||
# ansible_password: "3131"
|
||||
# ansible_become_password: "3131"
|
||||
|
||||
# 需要添加的节点:
|
||||
# - ash2e, ash3c, ch4 (服务器节点)
|
||||
# - hcp1, influxdb, browser (客户端节点)
|
||||
# - 修复或移除 ash1d, onecloud1 (问题节点)
|
||||
```
|
||||
|
||||
### **优先级 3: 使用现有脚本修复节点**
|
||||
```bash
|
||||
# 使用 nomad-configs 目录下的部署脚本
|
||||
cd /root/mgmt/nomad-configs
|
||||
|
||||
# 基于 warden 成功配置修复其他节点
|
||||
./scripts/deploy.sh hcp1
|
||||
./scripts/deploy.sh influxdb1
|
||||
./scripts/deploy.sh ash3c
|
||||
./scripts/deploy.sh ch4
|
||||
./scripts/deploy.sh browser
|
||||
|
||||
# 或者批量部署
|
||||
for node in hcp1 influxdb1 ash3c ch4 browser; do
|
||||
./scripts/deploy.sh $node
|
||||
done
|
||||
```
|
||||
|
||||
### **优先级 4: 验证 GitOps 流程**
|
||||
```bash
|
||||
# 推送测试变更
|
||||
git add .
|
||||
git commit -m "TEST: Trigger GitOps workflow"
|
||||
git push origin main
|
||||
|
||||
# 检查工作流执行
|
||||
curl -s "http://gitea.tailnet-68f9.ts.net/api/v1/repos/ben/mgmt/actions/runs"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚠️ 重要注意事项
|
||||
|
||||
### **不要做的事情**
|
||||
1. **❌ 不要手动修改节点配置**: 会导致配置漂移
|
||||
2. **❌ 不要直接 SSH 到节点**: 使用 Ansible inventory
|
||||
3. **❌ 不要绕过 GitOps 流程**: 所有变更都应该通过 Git
|
||||
|
||||
### **必须遵循的原则**
|
||||
1. **✅ 主客观相统一**: 代码即配置,一切通过仓库管理
|
||||
2. **✅ 自动化优先**: 避免手工操作
|
||||
3. **✅ 一致性保证**: 所有节点配置统一
|
||||
|
||||
### **关键文件**
|
||||
- **Ansible Inventory**: `ansible/inventory/hosts.yml` (当前只有 warden)
|
||||
- **成功配置模板**: `nomad-configs/nodes/warden.hcl` (✅ 基准配置)
|
||||
- **部署脚本**: `nomad-configs/scripts/deploy.sh`
|
||||
- **修复脚本**: `scripts/fix-nomad-nodes.sh`
|
||||
- **工作流**: `.gitea/workflows/deploy-nomad.yml` (❌ 未启用)
|
||||
- **Ansible 配置**: `ansible/ansible.cfg`
|
||||
- **zsh 修复剧本**: `ansible/fix-warden-zsh.yml`
|
||||
|
||||
---
|
||||
|
||||
## 🎯 成功标准
|
||||
|
||||
### **短期目标 (1-2小时)**
|
||||
- [ ] 启用 Gitea Actions
|
||||
- [ ] 修复 ash1d 节点
|
||||
- [ ] 验证 GitOps 流程工作
|
||||
|
||||
### **中期目标 (今天内)**
|
||||
- [ ] 所有 Nomad 节点 ready
|
||||
- [ ] Consul 集群稳定
|
||||
- [ ] Vault 集群部署成功
|
||||
|
||||
### **长期目标 (本周内)**
|
||||
- [ ] 完整的服务栈运行
|
||||
- [ ] 自动化部署流程稳定
|
||||
- [ ] 监控和告警就位
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ 可用工具和脚本
|
||||
|
||||
### **Ansible 剧本**
|
||||
```bash
|
||||
# 修复 warden 节点的 zsh 配置问题
|
||||
ansible-playbook -i ansible/inventory/hosts.yml ansible/fix-warden-zsh.yml
|
||||
|
||||
# 扩展到其他节点 (需要先更新 inventory)
|
||||
ansible-playbook -i ansible/inventory/hosts.yml ansible/fix-warden-zsh.yml --limit all
|
||||
```
|
||||
|
||||
### **Nomad 配置部署**
|
||||
```bash
|
||||
# 使用现有的部署脚本 (基于 warden 成功模板)
|
||||
cd nomad-configs
|
||||
./scripts/deploy.sh <节点名>
|
||||
|
||||
# 可用节点: warden, hcp1, influxdb1, ash3c, ch4, browser
|
||||
# 问题节点: onecloud1 (已离开), ash1d (需要修复)
|
||||
```
|
||||
|
||||
### **系统修复脚本**
|
||||
```bash
|
||||
# 修复 Nomad 节点的通用脚本
|
||||
./scripts/fix-nomad-nodes.sh
|
||||
|
||||
# Webhook 部署脚本
|
||||
./scripts/webhook-deploy.sh
|
||||
```
|
||||
|
||||
### **当前 Ansible Inventory 状态**
|
||||
```yaml
|
||||
# ansible/inventory/hosts.yml - 当前只配置了 warden
|
||||
all:
|
||||
children:
|
||||
warden:
|
||||
hosts:
|
||||
warden:
|
||||
ansible_host: 100.122.197.112
|
||||
ansible_user: ben
|
||||
ansible_password: "3131"
|
||||
ansible_become_password: "3131"
|
||||
|
||||
# ⚠️ 需要添加其他节点的配置信息
|
||||
```
|
||||
|
||||
### **推荐的修复顺序**
|
||||
1. **启用 Gitea Actions** - 恢复 GitOps 自动化
|
||||
2. **扩展 Ansible Inventory** - 添加所有节点配置
|
||||
3. **使用 warden 模板修复节点** - 基于成功配置
|
||||
4. **验证 Nomad 集群状态** - 确保所有节点 ready
|
||||
5. **部署服务栈** - Consul + Vault + Traefik
|
||||
|
||||
---
|
||||
|
||||
## 🆘 紧急联系信息
|
||||
|
||||
**当前 AI 助手**: 遇到困难,需要交接
|
||||
**系统状态**: 部分功能失效,需要修复
|
||||
**紧急程度**: 中等 (服务可用但不稳定)
|
||||
|
||||
**快速诊断检查清单**:
|
||||
```bash
|
||||
# 1. 检查 Gitea Actions 状态 (最重要!)
|
||||
curl -s "http://gitea.tailnet-68f9.ts.net/api/v1/repos/ben/mgmt" | jq '.has_actions'
|
||||
# 期望: true (当前: false ❌)
|
||||
|
||||
# 2. 检查 Nomad 集群状态
|
||||
nomad node status
|
||||
# 期望: 所有节点 ready (当前: ash1d down ❌)
|
||||
|
||||
# 3. 检查 Consul 集群状态
|
||||
consul members
|
||||
# 期望: 3个服务器节点 alive (当前: ash3c, ch4, warden ✅)
|
||||
|
||||
# 4. 检查服务部署状态
|
||||
nomad job status
|
||||
# 期望: 服务 running (当前: 全部 pending ❌)
|
||||
|
||||
# 5. 检查 Ansible 连接
|
||||
ansible all -i ansible/inventory/hosts.yml -m ping
|
||||
# 期望: 所有节点 SUCCESS (当前: 只有 warden ⚠️)
|
||||
|
||||
# 6. 检查网络连通性
|
||||
tailscale status
|
||||
# 期望: 所有节点在线
|
||||
|
||||
# 7. 检查配置文件完整性
|
||||
ls -la nomad-configs/nodes/
|
||||
# 期望: 所有节点都有配置文件 (当前: ✅)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📝 交接总结
|
||||
|
||||
**当前状态**: 系统部分功能失效,需要新的 AI 助手接手
|
||||
**主要问题**: Gitea Actions 未启用,导致 GitOps 流程失效
|
||||
**解决方案**: 启用 Actions,修复节点,验证自动化流程
|
||||
**成功标准**: 所有节点 ready,服务正常部署,GitOps 流程稳定
|
||||
|
||||
**祝新的 AI 助手好运!** 🍀
|
||||
|
||||
---
|
||||
|
||||
*交接仪式完成 - 2025-10-09 12:15 UTC*
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
---
|
||||
# Ansible 批量修复所有服务器节点的安全配置
|
||||
- name: 修复所有 Nomad 服务器节点的安全配置
|
||||
hosts: ash1d,ash2e,onecloud1
|
||||
gather_facts: no
|
||||
vars:
|
||||
nomad_servers:
|
||||
- "semaphore.tailnet-68f9.ts.net:4647"
|
||||
- "ash1d.tailnet-68f9.ts.net:4647"
|
||||
- "ash2e.tailnet-68f9.ts.net:4647"
|
||||
- "ch2.tailnet-68f9.ts.net:4647"
|
||||
- "ch3.tailnet-68f9.ts.net:4647"
|
||||
- "onecloud1.tailnet-68f9.ts.net:4647"
|
||||
- "de.tailnet-68f9.ts.net:4647"
|
||||
|
||||
tasks:
|
||||
- name: 生成安全的 Nomad 服务器配置
|
||||
template:
|
||||
src: server-secure.hcl.j2
|
||||
dest: /tmp/nomad-secure.hcl
|
||||
mode: '0644'
|
||||
|
||||
- name: 停止 Nomad 服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: stopped
|
||||
become: yes
|
||||
|
||||
- name: 备份当前配置
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: "/etc/nomad.d/nomad.hcl.backup.{{ ansible_date_time.epoch }}"
|
||||
remote_src: yes
|
||||
become: yes
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 部署安全配置
|
||||
copy:
|
||||
src: /tmp/nomad-secure.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
remote_src: yes
|
||||
become: yes
|
||||
|
||||
- name: 清理 Raft 数据以重新加入集群
|
||||
file:
|
||||
path: /opt/nomad/data/server/raft/
|
||||
state: absent
|
||||
become: yes
|
||||
|
||||
- name: 启动 Nomad 服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: started
|
||||
enabled: yes
|
||||
become: yes
|
||||
|
||||
- name: 等待服务启动
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
delay: 10
|
||||
timeout: 60
|
||||
|
|
@ -0,0 +1,59 @@
|
|||
---
|
||||
# 安全地修复客户端节点配置 - 先客户端,后服务器
|
||||
- name: 修复客户端节点不安全配置
|
||||
hosts: nomad_clients
|
||||
become: yes
|
||||
serial: 1 # 一个一个来,确保安全
|
||||
tasks:
|
||||
- name: 显示当前处理的节点
|
||||
debug:
|
||||
msg: "正在处理客户端节点: {{ inventory_hostname }}"
|
||||
|
||||
- name: 备份当前配置
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.backup.{{ ansible_date_time.epoch }}
|
||||
backup: yes
|
||||
|
||||
- name: 创建安全的客户端配置
|
||||
template:
|
||||
src: client-secure-template.hcl.j2
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
backup: yes
|
||||
notify: restart nomad
|
||||
|
||||
- name: 验证配置文件语法
|
||||
command: nomad config validate /etc/nomad.d/nomad.hcl
|
||||
register: config_validation
|
||||
|
||||
- name: 显示验证结果
|
||||
debug:
|
||||
msg: "{{ inventory_hostname }} 配置验证: {{ config_validation.stdout }}"
|
||||
|
||||
- name: 等待服务重启完成
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{% if inventory_hostname == 'influxdb' %}influxdb1.tailnet-68f9.ts.net{% else %}{{ inventory_hostname }}.tailnet-68f9.ts.net{% endif %}"
|
||||
delay: 10
|
||||
timeout: 60
|
||||
delegate_to: localhost
|
||||
|
||||
handlers:
|
||||
- name: restart nomad
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
daemon_reload: yes
|
||||
|
||||
post_tasks:
|
||||
- name: 验证节点重新加入集群
|
||||
uri:
|
||||
url: "http://{% if inventory_hostname == 'influxdb' %}influxdb1.tailnet-68f9.ts.net{% else %}{{ inventory_hostname }}.tailnet-68f9.ts.net{% endif %}:4646/v1/agent/self"
|
||||
method: GET
|
||||
register: node_status
|
||||
delegate_to: localhost
|
||||
|
||||
- name: 显示节点状态
|
||||
debug:
|
||||
msg: "{{ inventory_hostname }} 重新加入集群成功"
|
||||
when: node_status.status == 200
|
||||
|
|
@ -0,0 +1,106 @@
|
|||
# Nomad 客户端安全配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "{{ inventory_hostname }}"
|
||||
|
||||
# 安全绑定 - 只绑定到 Tailscale 接口
|
||||
{% if inventory_hostname == 'influxdb' %}
|
||||
bind_addr = "influxdb1.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "influxdb1.tailnet-68f9.ts.net"
|
||||
rpc = "influxdb1.tailnet-68f9.ts.net"
|
||||
serf = "influxdb1.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "influxdb1.tailnet-68f9.ts.net:4646"
|
||||
rpc = "influxdb1.tailnet-68f9.ts.net:4647"
|
||||
serf = "influxdb1.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
{% else %}
|
||||
bind_addr = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4646"
|
||||
rpc = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4647"
|
||||
serf = "{{ inventory_hostname }}.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
{% endif %}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 纯客户端模式
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
|
||||
# 连接到当前活跃的服务器节点
|
||||
servers = [
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647",
|
||||
"semaphore.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
|
||||
# 基本驱动
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
|
||||
# 激进的垃圾清理策略
|
||||
gc_interval = "5m"
|
||||
gc_disk_usage_threshold = 80
|
||||
gc_inode_usage_threshold = 70
|
||||
}
|
||||
|
||||
# Podman 插件配置
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 安全的 Consul 配置 - 指向本地客户端
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 禁用 Vault - 暂时
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
# 遥测配置
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
|
|
@ -0,0 +1,97 @@
|
|||
---
|
||||
# 测试本机 semaphore 的偷梁换柱功能
|
||||
- name: 测试 Ansible 偷梁换柱 - 修复 semaphore 不安全配置
|
||||
hosts: localhost
|
||||
become: yes
|
||||
tasks:
|
||||
- name: 备份当前配置
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.backup.{{ ansible_date_time.epoch }}
|
||||
backup: yes
|
||||
|
||||
- name: 创建安全的 semaphore 配置
|
||||
copy:
|
||||
content: |
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "semaphore"
|
||||
|
||||
# 安全绑定 - 只绑定到 Tailscale 接口
|
||||
bind_addr = "semaphore.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "semaphore.tailnet-68f9.ts.net"
|
||||
rpc = "semaphore.tailnet-68f9.ts.net"
|
||||
serf = "semaphore.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "semaphore.tailnet-68f9.ts.net:4646"
|
||||
rpc = "semaphore.tailnet-68f9.ts.net:4647"
|
||||
serf = "semaphore.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
|
||||
server_join {
|
||||
retry_join = [
|
||||
"semaphore.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ash2e.tailnet-68f9.ts.net:4647",
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"onecloud1.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# 安全的 Consul 配置
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
backup: yes
|
||||
notify: restart nomad
|
||||
|
||||
- name: 验证配置文件语法
|
||||
command: nomad config validate /etc/nomad.d/nomad.hcl
|
||||
register: config_validation
|
||||
|
||||
- name: 显示验证结果
|
||||
debug:
|
||||
msg: "配置验证结果: {{ config_validation.stdout }}"
|
||||
|
||||
handlers:
|
||||
- name: restart nomad
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
daemon_reload: yes
|
||||
|
|
@ -0,0 +1,30 @@
|
|||
# 检查 ash2e 的磁盘状态
|
||||
data "oci_core_boot_volumes" "ash2e_boot_volumes" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
availability_domain = "TZXJ:US-ASHBURN-AD-1"
|
||||
|
||||
filter {
|
||||
name = "display_name"
|
||||
values = ["ash2e"]
|
||||
}
|
||||
}
|
||||
|
||||
# 检查 ash2e 的实例状态
|
||||
data "oci_core_instances" "us_instances" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
availability_domain = "TZXJ:US-ASHBURN-AD-1"
|
||||
|
||||
filter {
|
||||
name = "display_name"
|
||||
values = ["ash2e"]
|
||||
}
|
||||
}
|
||||
|
||||
output "ash2e_disk_status" {
|
||||
value = {
|
||||
boot_volumes = data.oci_core_boot_volumes.ash2e_boot_volumes.boot_volumes
|
||||
instances = data.oci_core_instances.us_instances.instances
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,29 @@
|
|||
# 检查美国区域可用的 Debian 镜像
|
||||
data "oci_core_images" "us_debian_images" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
|
||||
# 过滤 Debian 操作系统
|
||||
filter {
|
||||
name = "operating_system"
|
||||
values = ["Debian"]
|
||||
}
|
||||
|
||||
# 按创建时间排序,获取最新的
|
||||
sort_by = "TIMECREATED"
|
||||
sort_order = "DESC"
|
||||
}
|
||||
|
||||
output "debian_images" {
|
||||
value = {
|
||||
debian_images = [
|
||||
for img in data.oci_core_images.us_debian_images.images : {
|
||||
display_name = img.display_name
|
||||
operating_system = img.operating_system
|
||||
operating_system_version = img.operating_system_version
|
||||
id = img.id
|
||||
time_created = img.time_created
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,55 @@
|
|||
# 检查现有实例的详细配置
|
||||
data "oci_core_instance" "ash1d" {
|
||||
provider = oci.us
|
||||
instance_id = "ocid1.instance.oc1.iad.anuwcljtkbqyulqcr3ekof6jr5mnmja2gl7vfmwf6s4nnsch6t5osfhwhhfq"
|
||||
}
|
||||
|
||||
data "oci_core_instance" "ash3c" {
|
||||
provider = oci.us
|
||||
instance_id = "ocid1.instance.oc1.iad.anuwcljtkbqyulqczicblxqyu3nxtqv2dqfpaitqgffbrmb7ztu3xiuefhxq"
|
||||
}
|
||||
|
||||
# 获取 VNIC 信息
|
||||
data "oci_core_vnic_attachments" "ash1d_vnics" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
instance_id = data.oci_core_instance.ash1d.id
|
||||
}
|
||||
|
||||
data "oci_core_vnic_attachments" "ash3c_vnics" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
instance_id = data.oci_core_instance.ash3c.id
|
||||
}
|
||||
|
||||
# 获取 VNIC 详细信息
|
||||
data "oci_core_vnic" "ash1d_vnic" {
|
||||
provider = oci.us
|
||||
vnic_id = data.oci_core_vnic_attachments.ash1d_vnics.vnic_attachments[0].vnic_id
|
||||
}
|
||||
|
||||
data "oci_core_vnic" "ash3c_vnic" {
|
||||
provider = oci.us
|
||||
vnic_id = data.oci_core_vnic_attachments.ash3c_vnics.vnic_attachments[0].vnic_id
|
||||
}
|
||||
|
||||
output "existing_instances_info" {
|
||||
value = {
|
||||
ash1d = {
|
||||
id = data.oci_core_instance.ash1d.id
|
||||
display_name = data.oci_core_instance.ash1d.display_name
|
||||
public_ip = data.oci_core_instance.ash1d.public_ip
|
||||
private_ip = data.oci_core_instance.ash1d.private_ip
|
||||
subnet_id = data.oci_core_instance.ash1d.subnet_id
|
||||
ipv6addresses = data.oci_core_vnic.ash1d_vnic.ipv6addresses
|
||||
}
|
||||
ash3c = {
|
||||
id = data.oci_core_instance.ash3c.id
|
||||
display_name = data.oci_core_instance.ash3c.display_name
|
||||
public_ip = data.oci_core_instance.ash3c.public_ip
|
||||
private_ip = data.oci_core_instance.ash3c.private_ip
|
||||
subnet_id = data.oci_core_instance.ash3c.subnet_id
|
||||
ipv6addresses = data.oci_core_vnic.ash3c_vnic.ipv6addresses
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,109 @@
|
|||
# 检查 ash2e 实例状态
|
||||
terraform {
|
||||
required_providers {
|
||||
oci = {
|
||||
source = "oracle/oci"
|
||||
version = "~> 7.0"
|
||||
}
|
||||
consul = {
|
||||
source = "hashicorp/consul"
|
||||
version = "~> 2.22"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 从 Consul 获取美国区域配置
|
||||
data "consul_keys" "oracle_config_us_check" {
|
||||
key {
|
||||
name = "tenancy_ocid"
|
||||
path = "config/dev/oracle/us/tenancy_ocid"
|
||||
}
|
||||
key {
|
||||
name = "user_ocid"
|
||||
path = "config/dev/oracle/us/user_ocid"
|
||||
}
|
||||
key {
|
||||
name = "fingerprint"
|
||||
path = "config/dev/oracle/us/fingerprint"
|
||||
}
|
||||
key {
|
||||
name = "private_key_path"
|
||||
path = "config/dev/oracle/us/private_key_path"
|
||||
}
|
||||
key {
|
||||
name = "region"
|
||||
path = "config/dev/oracle/us/region"
|
||||
}
|
||||
}
|
||||
|
||||
# 配置美国区域 Provider
|
||||
provider "oci" {
|
||||
alias = "us_check"
|
||||
tenancy_ocid = data.consul_keys.oracle_config_us_check.var.tenancy_ocid
|
||||
user_ocid = data.consul_keys.oracle_config_us_check.var.user_ocid
|
||||
fingerprint = data.consul_keys.oracle_config_us_check.var.fingerprint
|
||||
private_key_path = data.consul_keys.oracle_config_us_check.var.private_key_path
|
||||
region = data.consul_keys.oracle_config_us_check.var.region
|
||||
}
|
||||
|
||||
# 获取美国区域的所有实例
|
||||
data "oci_core_instances" "us_instances" {
|
||||
provider = oci.us_check
|
||||
compartment_id = data.consul_keys.oracle_config_us_check.var.tenancy_ocid
|
||||
}
|
||||
|
||||
# 获取美国区域的所有磁盘卷
|
||||
data "oci_core_volumes" "us_volumes" {
|
||||
provider = oci.us_check
|
||||
compartment_id = data.consul_keys.oracle_config_us_check.var.tenancy_ocid
|
||||
}
|
||||
|
||||
# 获取美国区域的所有启动卷
|
||||
data "oci_core_boot_volumes" "us_boot_volumes" {
|
||||
provider = oci.us_check
|
||||
availability_domain = "TZXJ:US-ASHBURN-AD-1"
|
||||
compartment_id = data.consul_keys.oracle_config_us_check.var.tenancy_ocid
|
||||
}
|
||||
|
||||
# 输出所有实例信息
|
||||
output "us_instances_status" {
|
||||
value = {
|
||||
for instance in data.oci_core_instances.us_instances.instances :
|
||||
instance.display_name => {
|
||||
id = instance.id
|
||||
state = instance.state
|
||||
shape = instance.shape
|
||||
availability_domain = instance.availability_domain
|
||||
time_created = instance.time_created
|
||||
}
|
||||
}
|
||||
description = "美国区域所有实例状态"
|
||||
}
|
||||
|
||||
# 输出磁盘状态 - 关键信息!
|
||||
output "us_volumes_status" {
|
||||
value = {
|
||||
for volume in data.oci_core_volumes.us_volumes.volumes :
|
||||
volume.display_name => {
|
||||
id = volume.id
|
||||
state = volume.state
|
||||
size_in_gbs = volume.size_in_gbs
|
||||
time_created = volume.time_created
|
||||
}
|
||||
}
|
||||
description = "美国区域所有数据磁盘状态"
|
||||
}
|
||||
|
||||
# 输出启动磁盘状态 - 更关键!
|
||||
output "us_boot_volumes_status" {
|
||||
value = {
|
||||
for boot_volume in data.oci_core_boot_volumes.us_boot_volumes.boot_volumes :
|
||||
boot_volume.display_name => {
|
||||
id = boot_volume.id
|
||||
state = boot_volume.state
|
||||
size_in_gbs = boot_volume.size_in_gbs
|
||||
time_created = boot_volume.time_created
|
||||
}
|
||||
}
|
||||
description = "美国区域所有启动磁盘状态 - ash2e 的配置可能还在这里!"
|
||||
}
|
||||
|
|
@ -0,0 +1,38 @@
|
|||
# 检查美国区域可用的操作系统镜像
|
||||
data "oci_core_images" "us_images" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
|
||||
# 过滤操作系统
|
||||
filter {
|
||||
name = "operating_system"
|
||||
values = ["Canonical Ubuntu", "Oracle Linux"]
|
||||
}
|
||||
|
||||
# 按创建时间排序,获取最新的
|
||||
sort_by = "TIMECREATED"
|
||||
sort_order = "DESC"
|
||||
}
|
||||
|
||||
output "available_os_images" {
|
||||
value = {
|
||||
ubuntu_images = [
|
||||
for img in data.oci_core_images.us_images.images : {
|
||||
display_name = img.display_name
|
||||
operating_system = img.operating_system
|
||||
operating_system_version = img.operating_system_version
|
||||
id = img.id
|
||||
time_created = img.time_created
|
||||
} if img.operating_system == "Canonical Ubuntu"
|
||||
]
|
||||
oracle_linux_images = [
|
||||
for img in data.oci_core_images.us_images.images : {
|
||||
display_name = img.display_name
|
||||
operating_system = img.operating_system
|
||||
operating_system_version = img.operating_system_version
|
||||
id = img.id
|
||||
time_created = img.time_created
|
||||
} if img.operating_system == "Oracle Linux"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,20 @@
|
|||
# 检查美国区域所有实例
|
||||
data "oci_core_instances" "us_all_instances" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
}
|
||||
|
||||
output "us_all_instances_summary" {
|
||||
value = {
|
||||
total_count = length(data.oci_core_instances.us_all_instances.instances)
|
||||
instances = [
|
||||
for instance in data.oci_core_instances.us_all_instances.instances : {
|
||||
name = instance.display_name
|
||||
state = instance.state
|
||||
shape = instance.shape
|
||||
id = instance.id
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,105 @@
|
|||
# 创建 ash2e 实例配置
|
||||
resource "oci_core_instance" "ash2e" {
|
||||
provider = oci.us
|
||||
|
||||
# 基本配置
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
availability_domain = "TZXJ:US-ASHBURN-AD-1"
|
||||
shape = "VM.Standard.E2.1.Micro"
|
||||
display_name = "ash2e"
|
||||
|
||||
# 使用 Ubuntu 24.04 LTS
|
||||
source_details {
|
||||
source_type = "image"
|
||||
source_id = "ocid1.image.oc1.iad.aaaaaaaahmozwney6aptbe6dgdh3iledjxr2v6q74fjpatgnwiekedftmm2q" # Ubuntu 24.04 LTS
|
||||
|
||||
boot_volume_size_in_gbs = 50
|
||||
boot_volume_vpus_per_gb = 10
|
||||
}
|
||||
|
||||
# 网络配置 - 启用 IPv6,自动分配
|
||||
create_vnic_details {
|
||||
assign_public_ip = true
|
||||
assign_ipv6ip = true # 启用 IPv6,让 Oracle 自动分配
|
||||
hostname_label = "ash2e"
|
||||
subnet_id = "ocid1.subnet.oc1.iad.aaaaaaaapkx25eckkl3dps67o35iprz2gkqjd5bo3rc4rxf4si5hyj2ocara" # 使用 ash1d 的子网
|
||||
}
|
||||
|
||||
# SSH 密钥 - 使用本机的公钥
|
||||
metadata = {
|
||||
ssh_authorized_keys = "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMSUUfma8FKEFvH8Nq65XM2PZ9kitfgv1q727cKV9y5Z houzhongxu@seekkey.tech"
|
||||
user_data = base64encode(<<-EOF
|
||||
#!/bin/bash
|
||||
# 创建 ben 用户
|
||||
useradd -m -s /bin/bash ben
|
||||
usermod -aG sudo ben
|
||||
|
||||
# 为 ben 用户添加 SSH 密钥
|
||||
mkdir -p /home/ben/.ssh
|
||||
echo "ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIMSUUfma8FKEFvH8Nq65XM2PZ9kitfgv1q727cKV9y5Z houzhongxu@seekkey.tech" >> /home/ben/.ssh/authorized_keys
|
||||
chown -R ben:ben /home/ben/.ssh
|
||||
chmod 700 /home/ben/.ssh
|
||||
chmod 600 /home/ben/.ssh/authorized_keys
|
||||
|
||||
# 更新系统
|
||||
apt update && apt upgrade -y
|
||||
|
||||
# 安装常用工具
|
||||
apt install -y curl wget git vim htop
|
||||
|
||||
# 配置主机名
|
||||
hostnamectl set-hostname ash2e
|
||||
|
||||
# 重启网络服务以获取 IPv6
|
||||
systemctl restart networking
|
||||
EOF
|
||||
)
|
||||
}
|
||||
|
||||
# 临时禁用保护以便重新创建
|
||||
lifecycle {
|
||||
prevent_destroy = false
|
||||
ignore_changes = [
|
||||
source_details,
|
||||
metadata,
|
||||
create_vnic_details,
|
||||
time_created
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# 获取子网信息
|
||||
data "oci_core_subnets" "us_subnets" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
vcn_id = data.oci_core_vcns.us_vcns.virtual_networks[0].id
|
||||
}
|
||||
|
||||
# 获取 VCN 信息
|
||||
data "oci_core_vcns" "us_vcns" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
}
|
||||
|
||||
output "ash2e_instance_info" {
|
||||
value = {
|
||||
id = oci_core_instance.ash2e.id
|
||||
public_ip = oci_core_instance.ash2e.public_ip
|
||||
private_ip = oci_core_instance.ash2e.private_ip
|
||||
state = oci_core_instance.ash2e.state
|
||||
display_name = oci_core_instance.ash2e.display_name
|
||||
}
|
||||
}
|
||||
|
||||
output "us_subnets_info" {
|
||||
value = {
|
||||
subnets = [
|
||||
for subnet in data.oci_core_subnets.us_subnets.subnets : {
|
||||
id = subnet.id
|
||||
display_name = subnet.display_name
|
||||
cidr_block = subnet.cidr_block
|
||||
availability_domain = subnet.availability_domain
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
|
|
@ -4,7 +4,7 @@
|
|||
# ⚠️ 任何对服务器节点的操作都可能影响整个集群的稳定性!
|
||||
semaphore ansible_host=127.0.0.1 ansible_user=root ansible_password=3131 ansible_become_password=3131 ansible_ssh_common_args="-o PreferredAuthentications=password -o PubkeyAuthentication=no"
|
||||
ash1d ansible_host=ash1d.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ash2e ansible_host=ash2e.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ash2e ansible_host=ash2e.tailnet-68f9.ts.net ansible_user=ben
|
||||
ch2 ansible_host=ch2.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch3 ansible_host=ch3.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
onecloud1 ansible_host=onecloud1.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
|
|
|||
|
|
@ -61,7 +61,7 @@ kali ansible_host=kali ansible_user=ben ansible_become=yes ansible_become_pass=3
|
|||
|
||||
[hcp]
|
||||
hcp1 ansible_host=hcp1 ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
hcp2 ansible_host=hcp2 ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
# hcp2 ansible_host=hcp2 ansible_user=root ansible_become=yes ansible_become_pass=313131 # 节点不存在,已注释 (2025-10-10)
|
||||
|
||||
[feiniu]
|
||||
snail ansible_host=snail ansible_user=houzhongxu ansible_ssh_pass=Aa313131@ben ansible_become=yes ansible_become_pass=Aa313131@ben
|
||||
|
|
|
|||
|
|
@ -0,0 +1,23 @@
|
|||
apiVersion: 1
|
||||
|
||||
datasources:
|
||||
- name: Prometheus
|
||||
type: prometheus
|
||||
access: proxy
|
||||
url: http://prometheus.tailnet-68f9.ts.net:9090
|
||||
isDefault: true
|
||||
editable: true
|
||||
|
||||
- name: InfluxDB
|
||||
type: influxdb
|
||||
access: proxy
|
||||
url: http://influxdb1.tailnet-68f9.ts.net:8086
|
||||
database: VPS
|
||||
user: admin
|
||||
secureJsonData:
|
||||
password: "VU_dOCVZzqEHb9jSFsDe0bJlEBaVbiG4LqfoczlnmcbfrbmklSt904HJPL4idYGvVi0c2eHkYDi2zCTni7Ay4w=="
|
||||
jsonData:
|
||||
httpMode: GET
|
||||
organization: seekkey
|
||||
defaultBucket: VPS
|
||||
editable: true
|
||||
|
|
@ -4,48 +4,60 @@ terraform {
|
|||
source = "hashicorp/nomad"
|
||||
version = "~> 2.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Nomad 节点配置资源
|
||||
resource "nomad_node_pool" "default" {
|
||||
name = "default"
|
||||
description = "Default node pool for all nodes"
|
||||
}
|
||||
|
||||
# 定义需要修复的节点
|
||||
locals {
|
||||
nomad_nodes = {
|
||||
ch4 = {
|
||||
address = "ch4.tailnet-68f9.ts.net"
|
||||
datacenter = "dc1"
|
||||
node_class = ""
|
||||
}
|
||||
hcp1 = {
|
||||
address = "hcp1.tailnet-68f9.ts.net"
|
||||
datacenter = "dc1"
|
||||
node_class = ""
|
||||
}
|
||||
warden = {
|
||||
address = "warden.tailnet-68f9.ts.net"
|
||||
datacenter = "dc1"
|
||||
node_class = ""
|
||||
}
|
||||
ash1d = {
|
||||
address = "ash1d.tailnet-68f9.ts.net"
|
||||
datacenter = "dc1"
|
||||
node_class = ""
|
||||
}
|
||||
ash2e = {
|
||||
address = "ash2e.tailnet-68f9.ts.net"
|
||||
datacenter = "dc1"
|
||||
node_class = ""
|
||||
null = {
|
||||
source = "hashicorp/null"
|
||||
version = "~> 3.0"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 输出节点信息
|
||||
output "nomad_nodes" {
|
||||
value = local.nomad_nodes
|
||||
description = "Nomad 节点配置信息"
|
||||
# 测试 onecloud1 服务器配置
|
||||
resource "null_resource" "deploy_onecloud1_config" {
|
||||
|
||||
provisioner "file" {
|
||||
source = "${path.root}/../../nomad-configs-tofu/onecloud1-server.hcl"
|
||||
destination = "/tmp/nomad.hcl"
|
||||
|
||||
connection {
|
||||
type = "ssh"
|
||||
user = "ben"
|
||||
password = "3131"
|
||||
host = "onecloud1.tailnet-68f9.ts.net"
|
||||
timeout = "30s"
|
||||
}
|
||||
}
|
||||
|
||||
provisioner "remote-exec" {
|
||||
inline = [
|
||||
"echo '开始部署 onecloud1 服务器配置'",
|
||||
"sudo systemctl stop nomad || true",
|
||||
"sudo mkdir -p /etc/nomad.d",
|
||||
"sudo cp /tmp/nomad.hcl /etc/nomad.d/nomad.hcl",
|
||||
"sudo chown nomad:nomad /etc/nomad.d/nomad.hcl",
|
||||
"sudo systemctl start nomad",
|
||||
"sudo systemctl enable nomad",
|
||||
"sleep 15",
|
||||
"sudo systemctl status nomad --no-pager",
|
||||
"echo 'onecloud1 服务器配置部署完成'"
|
||||
]
|
||||
|
||||
connection {
|
||||
type = "ssh"
|
||||
user = "ben"
|
||||
password = "3131"
|
||||
host = "onecloud1.tailnet-68f9.ts.net"
|
||||
timeout = "30s"
|
||||
}
|
||||
}
|
||||
|
||||
# 触发器:配置文件变化时重新部署
|
||||
triggers = {
|
||||
config_hash = filemd5("${path.root}/../../nomad-configs-tofu/onecloud1-server.hcl")
|
||||
}
|
||||
}
|
||||
|
||||
# 输出部署状态
|
||||
output "onecloud1_deployment" {
|
||||
value = "onecloud1 服务器配置已部署"
|
||||
description = "onecloud1 节点部署状态"
|
||||
}
|
||||
|
|
@ -0,0 +1,44 @@
|
|||
job "fix-nomad-nodes-v2" {
|
||||
datacenters = ["dc1"]
|
||||
type = "batch"
|
||||
|
||||
group "node-config" {
|
||||
count = 1
|
||||
|
||||
task "fix-nodes" {
|
||||
driver = "raw_exec"
|
||||
|
||||
config {
|
||||
command = "/bin/bash"
|
||||
args = ["-c", <<EOF
|
||||
echo "开始修复 Nomad 节点配置..."
|
||||
|
||||
# 定义需要修复的节点
|
||||
NODES="ch4 hcp1 warden ash1d"
|
||||
|
||||
for node in $NODES; do
|
||||
echo "修复节点: $node"
|
||||
|
||||
# 通过 SSH 重启 Nomad 服务
|
||||
ssh -o StrictHostKeyChecking=no ben@$node.tailnet-68f9.ts.net "
|
||||
echo '3131' | sudo -S systemctl restart nomad
|
||||
echo '3131' | sudo -S systemctl enable nomad
|
||||
sleep 5
|
||||
systemctl status nomad --no-pager
|
||||
" || echo "节点 $node 修复失败"
|
||||
|
||||
sleep 2
|
||||
done
|
||||
|
||||
echo "节点修复完成"
|
||||
EOF
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -15,7 +15,7 @@ provider "nomad" {
|
|||
|
||||
# 使用 Nomad 配置模块
|
||||
module "nomad_config" {
|
||||
source = "./infrastructure/opentofu/modules/nomad-config"
|
||||
source = "./modules/nomad-config"
|
||||
|
||||
nomad_address = "http://semaphore.tailnet-68f9.ts.net:4646"
|
||||
datacenter = "dc1"
|
||||
|
|
@ -23,5 +23,5 @@ module "nomad_config" {
|
|||
|
||||
# 输出模块信息
|
||||
output "nomad_nodes_info" {
|
||||
value = module.nomad_config.nomad_nodes
|
||||
value = module.nomad_config.onecloud1_deployment
|
||||
}
|
||||
|
|
@ -0,0 +1,291 @@
|
|||
job "monitoring-stack" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
# Grafana 服务组
|
||||
group "grafana" {
|
||||
count = 1
|
||||
|
||||
volume "grafana-data" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "grafana-data"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 3000
|
||||
to = 3000
|
||||
}
|
||||
}
|
||||
|
||||
task "grafana" {
|
||||
driver = "exec"
|
||||
|
||||
volume_mount {
|
||||
volume = "grafana-data"
|
||||
destination = "/opt/grafana/data"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
# 下载和安装 Grafana
|
||||
artifact {
|
||||
source = "https://dl.grafana.com/oss/release/grafana-10.2.0.linux-amd64.tar.gz"
|
||||
destination = "local/"
|
||||
mode = "any"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "/bin/bash"
|
||||
args = [
|
||||
"-c",
|
||||
<<EOF
|
||||
set -e
|
||||
cd local
|
||||
tar -xzf grafana-10.2.0.linux-amd64.tar.gz
|
||||
mv grafana-10.2.0/* /opt/grafana/
|
||||
mkdir -p /opt/grafana/data /opt/grafana/logs /opt/grafana/plugins
|
||||
chmod +x /opt/grafana/bin/grafana-server
|
||||
|
||||
# 创建配置文件
|
||||
cat > /opt/grafana/conf/grafana.ini << 'INICONF'
|
||||
[server]
|
||||
http_port = 3000
|
||||
domain = grafana.tailnet-68f9.ts.net
|
||||
root_url = http://grafana.tailnet-68f9.ts.net:3000
|
||||
|
||||
[database]
|
||||
type = sqlite3
|
||||
path = /opt/grafana/data/grafana.db
|
||||
|
||||
[security]
|
||||
admin_password = admin123
|
||||
|
||||
[users]
|
||||
allow_sign_up = false
|
||||
|
||||
[log]
|
||||
mode = console
|
||||
level = info
|
||||
INICONF
|
||||
|
||||
# 启动 Grafana
|
||||
exec /opt/grafana/bin/grafana-server --config /opt/grafana/conf/grafana.ini
|
||||
EOF
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
env {
|
||||
GF_SECURITY_ADMIN_PASSWORD = "admin123"
|
||||
GF_SERVER_DOMAIN = "grafana.tailnet-68f9.ts.net"
|
||||
GF_SERVER_ROOT_URL = "http://grafana.tailnet-68f9.ts.net:3000"
|
||||
}
|
||||
|
||||
service {
|
||||
name = "grafana"
|
||||
port = "http"
|
||||
|
||||
tags = [
|
||||
"grafana",
|
||||
"monitoring",
|
||||
"dashboard"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/api/health"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Prometheus 服务组
|
||||
group "prometheus" {
|
||||
count = 1
|
||||
|
||||
volume "prometheus-data" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "prometheus-data"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 9090
|
||||
to = 9090
|
||||
}
|
||||
}
|
||||
|
||||
task "prometheus" {
|
||||
driver = "exec"
|
||||
|
||||
volume_mount {
|
||||
volume = "prometheus-data"
|
||||
destination = "/opt/prometheus/data"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
# 下载和安装 Prometheus
|
||||
artifact {
|
||||
source = "https://github.com/prometheus/prometheus/releases/download/v2.48.0/prometheus-2.48.0.linux-amd64.tar.gz"
|
||||
destination = "local/"
|
||||
mode = "any"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "/bin/bash"
|
||||
args = [
|
||||
"-c",
|
||||
<<EOF
|
||||
set -e
|
||||
cd local
|
||||
tar -xzf prometheus-2.48.0.linux-amd64.tar.gz
|
||||
mv prometheus-2.48.0/* /opt/prometheus/
|
||||
chmod +x /opt/prometheus/prometheus
|
||||
|
||||
# 创建配置文件
|
||||
cat > /opt/prometheus/prometheus.yml << 'PROMCONF'
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: 'prometheus'
|
||||
static_configs:
|
||||
- targets: ['localhost:9090']
|
||||
|
||||
- job_name: 'node-exporter'
|
||||
static_configs:
|
||||
- targets: ['node-exporter.tailnet-68f9.ts.net:9100']
|
||||
|
||||
- job_name: 'consul'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'ch4.tailnet-68f9.ts.net:8500'
|
||||
- 'ash3c.tailnet-68f9.ts.net:8500'
|
||||
- 'warden.tailnet-68f9.ts.net:8500'
|
||||
|
||||
- job_name: 'nomad'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'semaphore.tailnet-68f9.ts.net:4646'
|
||||
- 'ash1d.tailnet-68f9.ts.net:4646'
|
||||
- 'ash2e.tailnet-68f9.ts.net:4646'
|
||||
- 'ch2.tailnet-68f9.ts.net:4646'
|
||||
- 'ch3.tailnet-68f9.ts.net:4646'
|
||||
- 'onecloud1.tailnet-68f9.ts.net:4646'
|
||||
- 'de.tailnet-68f9.ts.net:4646'
|
||||
|
||||
- job_name: 'vault'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'master.tailnet-68f9.ts.net:8200'
|
||||
- 'ash3c.tailnet-68f9.ts.net:8200'
|
||||
- 'hcp1.tailnet-68f9.ts.net:8200'
|
||||
|
||||
- job_name: 'influxdb'
|
||||
static_configs:
|
||||
- targets: ['influxdb1.tailnet-68f9.ts.net:8086']
|
||||
PROMCONF
|
||||
|
||||
# 启动 Prometheus
|
||||
exec /opt/prometheus/prometheus --config.file=/opt/prometheus/prometheus.yml --storage.tsdb.path=/opt/prometheus/data --web.console.libraries=/opt/prometheus/console_libraries --web.console.templates=/opt/prometheus/consoles --storage.tsdb.retention.time=15d --web.enable-lifecycle
|
||||
EOF
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
service {
|
||||
name = "prometheus"
|
||||
port = "http"
|
||||
|
||||
tags = [
|
||||
"prometheus",
|
||||
"monitoring",
|
||||
"metrics"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/-/healthy"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Node Exporter 服务组
|
||||
group "node-exporter" {
|
||||
count = 1
|
||||
|
||||
network {
|
||||
port "metrics" {
|
||||
static = 9100
|
||||
to = 9100
|
||||
}
|
||||
}
|
||||
|
||||
task "node-exporter" {
|
||||
driver = "exec"
|
||||
|
||||
# 下载和安装 Node Exporter
|
||||
artifact {
|
||||
source = "https://github.com/prometheus/node_exporter/releases/download/v1.7.0/node_exporter-1.7.0.linux-amd64.tar.gz"
|
||||
destination = "local/"
|
||||
mode = "any"
|
||||
}
|
||||
|
||||
config {
|
||||
command = "/bin/bash"
|
||||
args = [
|
||||
"-c",
|
||||
<<EOF
|
||||
set -e
|
||||
cd local
|
||||
tar -xzf node_exporter-1.7.0.linux-amd64.tar.gz
|
||||
mv node_exporter-1.7.0/node_exporter /opt/node-exporter/
|
||||
chmod +x /opt/node-exporter/node_exporter
|
||||
|
||||
# 启动 Node Exporter
|
||||
exec /opt/node-exporter/node_exporter --path.procfs=/proc --path.rootfs=/ --path.sysfs=/sys --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)
|
||||
EOF
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 256
|
||||
}
|
||||
|
||||
service {
|
||||
name = "node-exporter"
|
||||
port = "metrics"
|
||||
|
||||
tags = [
|
||||
"node-exporter",
|
||||
"monitoring",
|
||||
"metrics"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/metrics"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,197 @@
|
|||
job "monitoring-stack" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
# Grafana 服务组
|
||||
group "grafana" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "influxdb"
|
||||
}
|
||||
|
||||
volume "grafana-data" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "grafana-data"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 3000
|
||||
to = 3000
|
||||
}
|
||||
}
|
||||
|
||||
task "grafana" {
|
||||
driver = "exec"
|
||||
|
||||
volume_mount {
|
||||
volume = "grafana-data"
|
||||
destination = "/var/lib/grafana"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
config {
|
||||
command = "/usr/sbin/grafana-server"
|
||||
args = [
|
||||
"--config", "/etc/grafana/grafana.ini",
|
||||
"--homepath", "/usr/share/grafana",
|
||||
"cfg:default.paths.data=/var/lib/grafana",
|
||||
"cfg:default.paths.logs=/var/log/grafana",
|
||||
"cfg:default.paths.plugins=/var/lib/grafana/plugins",
|
||||
"cfg:default.paths.provisioning=/etc/grafana/provisioning"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
env {
|
||||
GF_SECURITY_ADMIN_PASSWORD = "admin123"
|
||||
GF_SERVER_DOMAIN = "grafana.tailnet-68f9.ts.net"
|
||||
GF_SERVER_ROOT_URL = "http://grafana.tailnet-68f9.ts.net:3000"
|
||||
}
|
||||
|
||||
service {
|
||||
name = "grafana"
|
||||
port = "http"
|
||||
|
||||
tags = [
|
||||
"grafana",
|
||||
"monitoring",
|
||||
"dashboard"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/api/health"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Prometheus 服务组
|
||||
group "prometheus" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "influxdb"
|
||||
}
|
||||
|
||||
volume "prometheus-data" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "prometheus-data"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 9090
|
||||
to = 9090
|
||||
}
|
||||
}
|
||||
|
||||
task "prometheus" {
|
||||
driver = "exec"
|
||||
|
||||
volume_mount {
|
||||
volume = "prometheus-data"
|
||||
destination = "/var/lib/prometheus"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
config {
|
||||
command = "/usr/bin/prometheus"
|
||||
args = [
|
||||
"--config.file=/etc/prometheus/prometheus.yml",
|
||||
"--storage.tsdb.path=/var/lib/prometheus",
|
||||
"--web.console.libraries=/etc/prometheus/console_libraries",
|
||||
"--web.console.templates=/etc/prometheus/consoles",
|
||||
"--storage.tsdb.retention.time=15d",
|
||||
"--web.enable-lifecycle"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
service {
|
||||
name = "prometheus"
|
||||
port = "http"
|
||||
|
||||
tags = [
|
||||
"prometheus",
|
||||
"monitoring",
|
||||
"metrics"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/-/healthy"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Node Exporter 服务组
|
||||
group "node-exporter" {
|
||||
count = 1
|
||||
|
||||
network {
|
||||
port "metrics" {
|
||||
static = 9100
|
||||
to = 9100
|
||||
}
|
||||
}
|
||||
|
||||
task "node-exporter" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/bin/prometheus-node-exporter"
|
||||
args = [
|
||||
"--path.procfs=/proc",
|
||||
"--path.rootfs=/",
|
||||
"--path.sysfs=/sys",
|
||||
"--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 256
|
||||
}
|
||||
|
||||
service {
|
||||
name = "node-exporter"
|
||||
port = "metrics"
|
||||
|
||||
tags = [
|
||||
"node-exporter",
|
||||
"monitoring",
|
||||
"metrics"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/metrics"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,186 @@
|
|||
job "monitoring-stack" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
# Grafana 服务组
|
||||
group "grafana" {
|
||||
count = 1
|
||||
|
||||
volume "grafana-data" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "grafana-data"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 3000
|
||||
to = 3000
|
||||
}
|
||||
}
|
||||
|
||||
task "grafana" {
|
||||
driver = "exec"
|
||||
|
||||
volume_mount {
|
||||
volume = "grafana-data"
|
||||
destination = "/opt/grafana/data"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
config {
|
||||
command = "/opt/grafana/bin/grafana-server"
|
||||
args = [
|
||||
"--config", "/opt/grafana/conf/grafana.ini",
|
||||
"--homepath", "/opt/grafana",
|
||||
"cfg:default.paths.data=/opt/grafana/data",
|
||||
"cfg:default.paths.logs=/opt/grafana/logs",
|
||||
"cfg:default.paths.plugins=/opt/grafana/plugins",
|
||||
"cfg:default.paths.provisioning=/opt/grafana/conf/provisioning"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
env {
|
||||
GF_SECURITY_ADMIN_PASSWORD = "admin123"
|
||||
GF_INSTALL_PLUGINS = "grafana-piechart-panel"
|
||||
GF_SERVER_DOMAIN = "grafana.tailnet-68f9.ts.net"
|
||||
GF_SERVER_ROOT_URL = "http://grafana.tailnet-68f9.ts.net:3000"
|
||||
}
|
||||
|
||||
service {
|
||||
name = "grafana"
|
||||
port = "http"
|
||||
|
||||
tags = [
|
||||
"grafana",
|
||||
"monitoring",
|
||||
"dashboard"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/api/health"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Prometheus 服务组
|
||||
group "prometheus" {
|
||||
count = 1
|
||||
|
||||
volume "prometheus-data" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "prometheus-data"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 9090
|
||||
to = 9090
|
||||
}
|
||||
}
|
||||
|
||||
task "prometheus" {
|
||||
driver = "exec"
|
||||
|
||||
volume_mount {
|
||||
volume = "prometheus-data"
|
||||
destination = "/opt/prometheus/data"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
config {
|
||||
command = "/opt/prometheus/prometheus"
|
||||
args = [
|
||||
"--config.file=/opt/prometheus/prometheus.yml",
|
||||
"--storage.tsdb.path=/opt/prometheus/data",
|
||||
"--web.console.libraries=/opt/prometheus/console_libraries",
|
||||
"--web.console.templates=/opt/prometheus/consoles",
|
||||
"--storage.tsdb.retention.time=15d",
|
||||
"--web.enable-lifecycle"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
service {
|
||||
name = "prometheus"
|
||||
port = "http"
|
||||
|
||||
tags = [
|
||||
"prometheus",
|
||||
"monitoring",
|
||||
"metrics"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/-/healthy"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Node Exporter 服务组
|
||||
group "node-exporter" {
|
||||
count = 1
|
||||
|
||||
network {
|
||||
port "metrics" {
|
||||
static = 9100
|
||||
to = 9100
|
||||
}
|
||||
}
|
||||
|
||||
task "node-exporter" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/opt/node-exporter/node_exporter"
|
||||
args = [
|
||||
"--path.procfs=/proc",
|
||||
"--path.rootfs=/",
|
||||
"--path.sysfs=/sys",
|
||||
"--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 256
|
||||
}
|
||||
|
||||
service {
|
||||
name = "node-exporter"
|
||||
port = "metrics"
|
||||
|
||||
tags = [
|
||||
"node-exporter",
|
||||
"monitoring",
|
||||
"metrics"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/metrics"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,87 @@
|
|||
# OpenTofu 小王 - 客户端配置部署
|
||||
terraform {
|
||||
required_providers {
|
||||
null = {
|
||||
source = "registry.opentofu.org/hashicorp/null"
|
||||
version = "3.2.4"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 6个客户端节点列表
|
||||
variable "client_nodes" {
|
||||
type = list(string)
|
||||
default = [
|
||||
"hcp1",
|
||||
"influxdb",
|
||||
"ash3c",
|
||||
"ch4",
|
||||
"warden",
|
||||
"browser"
|
||||
]
|
||||
}
|
||||
|
||||
# 为每个客户端节点生成配置文件
|
||||
resource "local_file" "client_configs" {
|
||||
for_each = toset(var.client_nodes)
|
||||
|
||||
filename = "${path.module}/generated/${each.key}-client.hcl"
|
||||
content = replace(
|
||||
file("${path.module}/../nomad-configs-tofu/client-template.hcl"),
|
||||
"NODE_NAME",
|
||||
each.key
|
||||
)
|
||||
}
|
||||
|
||||
# 部署配置到每个客户端节点
|
||||
resource "null_resource" "client_deploy" {
|
||||
for_each = toset(var.client_nodes)
|
||||
|
||||
depends_on = [local_file.client_configs]
|
||||
|
||||
provisioner "local-exec" {
|
||||
command = <<EOF
|
||||
echo "=== 部署客户端配置到 ${each.key} ==="
|
||||
echo "开始时间: $(date)"
|
||||
|
||||
echo "1. 测试连接 ${each.key}..."
|
||||
ping -c 1 ${each.key}.tailnet-68f9.ts.net || echo " - ${each.key} ping 失败"
|
||||
|
||||
echo "2. 上传配置文件..."
|
||||
sshpass -p '3131' scp -o StrictHostKeyChecking=no -o ConnectTimeout=5 \
|
||||
${path.module}/generated/${each.key}-client.hcl \
|
||||
ben@${each.key}.tailnet-68f9.ts.net:/tmp/nomad-new.hcl && echo " - 文件上传成功" || echo " - 文件上传失败"
|
||||
|
||||
echo "3. 部署配置并重启服务..."
|
||||
sshpass -p '3131' ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 \
|
||||
ben@${each.key}.tailnet-68f9.ts.net \
|
||||
"echo '=== ${each.key} 客户端配置部署开始 ==='; \
|
||||
echo '3131' | sudo -S systemctl stop nomad; \
|
||||
echo '备份旧配置...'; \
|
||||
echo '3131' | sudo -S cp /etc/nomad.d/nomad.hcl /etc/nomad.d/nomad.hcl.backup.\$(date +%Y%m%d_%H%M%S) 2>/dev/null || true; \
|
||||
echo '替换配置文件...'; \
|
||||
echo '3131' | sudo -S cp /tmp/nomad-new.hcl /etc/nomad.d/nomad.hcl; \
|
||||
echo '启动服务...'; \
|
||||
echo '3131' | sudo -S systemctl start nomad; \
|
||||
sleep 5; \
|
||||
echo '检查服务状态...'; \
|
||||
echo '3131' | sudo -S systemctl status nomad --no-pager; \
|
||||
echo '=== ${each.key} 部署完成 ==='" && echo " - ${each.key} 部署成功" || echo " - ${each.key} 部署失败"
|
||||
|
||||
echo "=== ${each.key} 配置部署完成!时间: $(date) ==="
|
||||
EOF
|
||||
}
|
||||
|
||||
triggers = {
|
||||
config_hash = local_file.client_configs[each.key].content_md5
|
||||
deploy_time = timestamp()
|
||||
}
|
||||
}
|
||||
|
||||
output "deployment_summary" {
|
||||
value = {
|
||||
client_nodes = var.client_nodes
|
||||
config_files = [for node in var.client_nodes : "${node}-client.hcl"]
|
||||
deploy_time = timestamp()
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
# Nomad 客户端节点极简配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
name = "ash3c"
|
||||
|
||||
bind_addr = "ash3c.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "ash3c.tailnet-68f9.ts.net"
|
||||
rpc = "ash3c.tailnet-68f9.ts.net"
|
||||
serf = "ash3c.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "ash3c.tailnet-68f9.ts.net:4646"
|
||||
rpc = "ash3c.tailnet-68f9.ts.net:4647"
|
||||
serf = "ash3c.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 纯客户端模式
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
|
||||
# 连接到当前活跃的服务器节点
|
||||
servers = [
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647",
|
||||
"semaphore.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
|
||||
# 基本驱动
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
# 极简 Consul 配置
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
auto_advertise = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 禁用 Vault - 清理垃圾配置
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
# Nomad 客户端节点极简配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
name = "browser"
|
||||
|
||||
bind_addr = "browser.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "browser.tailnet-68f9.ts.net"
|
||||
rpc = "browser.tailnet-68f9.ts.net"
|
||||
serf = "browser.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "browser.tailnet-68f9.ts.net:4646"
|
||||
rpc = "browser.tailnet-68f9.ts.net:4647"
|
||||
serf = "browser.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 纯客户端模式
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
|
||||
# 连接到当前活跃的服务器节点
|
||||
servers = [
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647",
|
||||
"semaphore.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
|
||||
# 基本驱动
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
# 极简 Consul 配置
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
auto_advertise = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 禁用 Vault - 清理垃圾配置
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
# Nomad 客户端节点极简配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
name = "ch4"
|
||||
|
||||
bind_addr = "ch4.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "ch4.tailnet-68f9.ts.net"
|
||||
rpc = "ch4.tailnet-68f9.ts.net"
|
||||
serf = "ch4.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "ch4.tailnet-68f9.ts.net:4646"
|
||||
rpc = "ch4.tailnet-68f9.ts.net:4647"
|
||||
serf = "ch4.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 纯客户端模式
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
|
||||
# 连接到当前活跃的服务器节点
|
||||
servers = [
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647",
|
||||
"semaphore.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
|
||||
# 基本驱动
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
# 极简 Consul 配置
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
auto_advertise = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 禁用 Vault - 清理垃圾配置
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
# Nomad 客户端节点极简配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
name = "hcp1"
|
||||
|
||||
bind_addr = "hcp1.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "hcp1.tailnet-68f9.ts.net"
|
||||
rpc = "hcp1.tailnet-68f9.ts.net"
|
||||
serf = "hcp1.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "hcp1.tailnet-68f9.ts.net:4646"
|
||||
rpc = "hcp1.tailnet-68f9.ts.net:4647"
|
||||
serf = "hcp1.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 纯客户端模式
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
|
||||
# 连接到当前活跃的服务器节点
|
||||
servers = [
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647",
|
||||
"semaphore.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
|
||||
# 基本驱动
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
# 极简 Consul 配置
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
auto_advertise = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 禁用 Vault - 清理垃圾配置
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
# Nomad 客户端节点极简配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
name = "influxdb"
|
||||
|
||||
bind_addr = "influxdb.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "influxdb.tailnet-68f9.ts.net"
|
||||
rpc = "influxdb.tailnet-68f9.ts.net"
|
||||
serf = "influxdb.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "influxdb.tailnet-68f9.ts.net:4646"
|
||||
rpc = "influxdb.tailnet-68f9.ts.net:4647"
|
||||
serf = "influxdb.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 纯客户端模式
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
|
||||
# 连接到当前活跃的服务器节点
|
||||
servers = [
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647",
|
||||
"semaphore.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
|
||||
# 基本驱动
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
# 极简 Consul 配置
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
auto_advertise = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 禁用 Vault - 清理垃圾配置
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
# Nomad 客户端节点极简配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
name = "warden"
|
||||
|
||||
bind_addr = "warden.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "warden.tailnet-68f9.ts.net"
|
||||
rpc = "warden.tailnet-68f9.ts.net"
|
||||
serf = "warden.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "warden.tailnet-68f9.ts.net:4646"
|
||||
rpc = "warden.tailnet-68f9.ts.net:4647"
|
||||
serf = "warden.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 纯客户端模式
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
|
||||
# 连接到当前活跃的服务器节点
|
||||
servers = [
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647",
|
||||
"semaphore.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
|
||||
# 基本驱动
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
}
|
||||
|
||||
# 极简 Consul 配置
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
auto_advertise = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 禁用 Vault - 清理垃圾配置
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
|
@ -0,0 +1,23 @@
|
|||
# Nomad 配置 - OpenTofu 管理
|
||||
|
||||
## 节点分配
|
||||
|
||||
### 服务器节点 (3个)
|
||||
- ash3c.tailnet-68f9.ts.net
|
||||
- ch4.tailnet-68f9.ts.net
|
||||
- warden.tailnet-68f9.ts.net
|
||||
|
||||
### 客户端节点 (4个)
|
||||
- hcp1.tailnet-68f9.ts.net
|
||||
- influxdb.tailnet-68f9.ts.net (influxdb1)
|
||||
- browser.tailnet-68f9.ts.net
|
||||
- ash1d.tailnet-68f9.ts.net
|
||||
|
||||
### 已删除节点
|
||||
- ash2e.tailnet-68f9.ts.net (实例被删除,需要重建)
|
||||
|
||||
## 配置原则
|
||||
- 极简配置,移除所有垃圾
|
||||
- 禁用 Vault (历史遗留问题)
|
||||
- 使用本地 Consul (127.0.0.1:8500)
|
||||
- 服务器节点也可运行作业 (client.enabled = true)
|
||||
|
|
@ -0,0 +1,82 @@
|
|||
# ash1d - 基于 onecloud1 成功配置直接替换节点名
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ash1d"
|
||||
|
||||
bind_addr = "ash1d.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "ash1d.tailnet-68f9.ts.net"
|
||||
rpc = "ash1d.tailnet-68f9.ts.net"
|
||||
serf = "ash1d.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "ash1d.tailnet-68f9.ts.net:4646"
|
||||
rpc = "ash1d.tailnet-68f9.ts.net:4647"
|
||||
serf = "ash1d.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 服务器模式 + 七仙女发现配置
|
||||
server {
|
||||
enabled = true
|
||||
|
||||
server_join {
|
||||
retry_join = [
|
||||
"semaphore.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"onecloud1.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# 服务器不运行作业
|
||||
client {
|
||||
enabled = false
|
||||
network_interface = "tailscale0"
|
||||
}
|
||||
|
||||
# Podman 插件
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 本地 Consul 客户端
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 禁用 Vault
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
# 遥测配置
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
# Nomad 客户端节点配置模板 - 基于 Ansible 配置优化,去除垃圾 meta 标签
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "NODE_NAME"
|
||||
|
||||
bind_addr = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
rpc = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
serf = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "NODE_NAME.tailnet-68f9.ts.net:4646"
|
||||
rpc = "NODE_NAME.tailnet-68f9.ts.net:4647"
|
||||
serf = "NODE_NAME.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
|
||||
# 激进的垃圾清理策略 - 继承自 Ansible 配置
|
||||
gc_interval = "5m"
|
||||
gc_disk_usage_threshold = 80
|
||||
gc_inode_usage_threshold = 70
|
||||
}
|
||||
|
||||
# Podman 插件配置 - 继承自 Ansible 配置
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# Consul 配置 - 继承自 Ansible 配置
|
||||
consul {
|
||||
address = "ch4.tailnet-68f9.ts.net:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# Vault 配置 - 继承自 Ansible 配置
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://ch4.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -0,0 +1,70 @@
|
|||
# Nomad 客户端节点配置模板 - 基于现有 Ansible 配置
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "NODE_NAME"
|
||||
|
||||
bind_addr = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
rpc = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
serf = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "NODE_NAME.tailnet-68f9.ts.net:4646"
|
||||
rpc = "NODE_NAME.tailnet-68f9.ts.net:4647"
|
||||
serf = "NODE_NAME.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
|
||||
meta {
|
||||
consul = "true"
|
||||
consul_version = "1.21.5"
|
||||
}
|
||||
|
||||
# 激进的垃圾清理策略
|
||||
gc_interval = "5m"
|
||||
gc_disk_usage_threshold = 80
|
||||
gc_inode_usage_threshold = 70
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "ch4.tailnet-68f9.ts.net:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://ch4.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://warden.tailnet-68f9.ts.net:8200"
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -0,0 +1,87 @@
|
|||
# onecloud1 - 基于现有配置继承和扬弃
|
||||
# 继承:基础配置、网络配置、遥测配置
|
||||
# 扬弃:错误的服务器列表、Vault配置、客户端运行作业
|
||||
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins" # 继承:保留插件目录
|
||||
log_level = "DEBUG" # 启用调试模式
|
||||
name = "onecloud1"
|
||||
|
||||
bind_addr = "onecloud1.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "onecloud1.tailnet-68f9.ts.net"
|
||||
rpc = "onecloud1.tailnet-68f9.ts.net"
|
||||
serf = "onecloud1.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "onecloud1.tailnet-68f9.ts.net:4646"
|
||||
rpc = "onecloud1.tailnet-68f9.ts.net:4647"
|
||||
serf = "onecloud1.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 继承:服务器模式,加上服务器发现配置
|
||||
server {
|
||||
enabled = true
|
||||
|
||||
# 七仙女服务器发现配置
|
||||
server_join {
|
||||
retry_join = [
|
||||
"semaphore.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ash2e.tailnet-68f9.ts.net:4647",
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"onecloud1.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# 扬弃:服务器不应该运行作业,但保留网络接口配置
|
||||
client {
|
||||
enabled = false
|
||||
network_interface = "tailscale0" # 继承:网络接口配置
|
||||
}
|
||||
|
||||
# 继承:Podman 插件配置
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 扬弃:错误的 Consul 地址,改为本地客户端 - 分层解耦
|
||||
consul {
|
||||
address = "127.0.0.1:8500" # 修改:使用本地 Consul 客户端
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 扬弃:Vault 垃圾配置
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
# 继承:遥测配置
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
# Nomad 服务器节点安全配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "NODE_NAME"
|
||||
|
||||
# 安全绑定 - 只绑定到 Tailscale 接口
|
||||
bind_addr = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
rpc = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
serf = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "NODE_NAME.tailnet-68f9.ts.net:4646"
|
||||
rpc = "NODE_NAME.tailnet-68f9.ts.net:4647"
|
||||
serf = "NODE_NAME.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
|
||||
# 七仙女服务器发现配置
|
||||
server_join {
|
||||
retry_join = [
|
||||
"semaphore.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ash2e.tailnet-68f9.ts.net:4647",
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"onecloud1.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# 安全的 Consul 配置 - 指向本地客户端
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 安全的 Vault 配置 - 指向本地代理
|
||||
vault {
|
||||
enabled = false # 暂时禁用,等 Vault 集群部署完成
|
||||
}
|
||||
|
||||
# 遥测配置
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
# Nomad 服务器节点极简配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
name = "NODE_NAME"
|
||||
|
||||
bind_addr = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
rpc = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
serf = "NODE_NAME.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "NODE_NAME.tailnet-68f9.ts.net:4646"
|
||||
rpc = "NODE_NAME.tailnet-68f9.ts.net:4647"
|
||||
serf = "NODE_NAME.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
# 服务器模式
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
server_join {
|
||||
retry_join = [
|
||||
"ash3c.tailnet-68f9.ts.net",
|
||||
"ch4.tailnet-68f9.ts.net",
|
||||
"warden.tailnet-68f9.ts.net"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# 客户端也启用,服务器可以运行作业
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
}
|
||||
|
||||
# 极简 Consul 配置
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 禁用 Vault - 清理垃圾配置
|
||||
vault {
|
||||
enabled = false
|
||||
}
|
||||
|
|
@ -54,6 +54,16 @@ client {
|
|||
read_only = false
|
||||
}
|
||||
|
||||
host_volume "grafana-data" {
|
||||
path = "/opt/nomad/data/grafana-data"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
host_volume "prometheus-data" {
|
||||
path = "/opt/nomad/data/prometheus-data"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
# 禁用Docker驱动,只使用Podman
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
|
|
@ -82,23 +92,6 @@ plugin "nomad-driver-podman" {
|
|||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = false
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://master.tailnet-68f9.ts.net:8200,http://ash3c.tailnet-68f9.ts.net:8200,http://influxdb1.tailnet-68f9.ts.net:8200"
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = false
|
||||
|
|
|
|||
|
|
@ -0,0 +1,78 @@
|
|||
# OpenTofu 小王 - 修复不安全的服务器配置
|
||||
# terraform 块已在 onecloud1-deploy-clean.tf 中定义
|
||||
|
||||
# 需要修复的不安全服务器节点
|
||||
variable "insecure_servers" {
|
||||
type = list(string)
|
||||
default = [
|
||||
"ash1d",
|
||||
"ash2e"
|
||||
]
|
||||
}
|
||||
|
||||
# 为每个服务器节点生成安全配置文件
|
||||
resource "local_file" "secure_server_configs" {
|
||||
for_each = toset(var.insecure_servers)
|
||||
|
||||
filename = "${path.module}/generated/${each.key}-server-secure.hcl"
|
||||
content = replace(
|
||||
file("${path.module}/../nomad-configs-tofu/server-template-secure.hcl"),
|
||||
"NODE_NAME",
|
||||
each.key
|
||||
)
|
||||
}
|
||||
|
||||
# 部署安全配置到每个服务器节点
|
||||
resource "null_resource" "fix_insecure_servers" {
|
||||
for_each = toset(var.insecure_servers)
|
||||
|
||||
depends_on = [local_file.secure_server_configs]
|
||||
|
||||
provisioner "local-exec" {
|
||||
command = <<EOF
|
||||
echo "=== 修复 ${each.key} 的不安全配置 ==="
|
||||
echo "开始时间: $(date)"
|
||||
|
||||
echo "1. 测试连接 ${each.key}..."
|
||||
ping -c 1 ${each.key}.tailnet-68f9.ts.net || echo " - ${each.key} ping 失败"
|
||||
|
||||
echo "2. 上传安全配置文件..."
|
||||
sshpass -p '3131' scp -o StrictHostKeyChecking=no -o ConnectTimeout=5 \
|
||||
${path.module}/generated/${each.key}-server-secure.hcl \
|
||||
ben@${each.key}.tailnet-68f9.ts.net:/tmp/nomad-secure.hcl && echo " - 文件上传成功" || echo " - 文件上传失败"
|
||||
|
||||
echo "3. 备份旧配置并部署安全配置..."
|
||||
sshpass -p '3131' ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 \
|
||||
ben@${each.key}.tailnet-68f9.ts.net \
|
||||
"echo '=== ${each.key} 安全配置部署开始 ==='; \
|
||||
echo '3131' | sudo -S systemctl stop nomad; \
|
||||
echo '备份不安全的配置...'; \
|
||||
echo '3131' | sudo -S cp /etc/nomad.d/nomad.hcl /etc/nomad.d/nomad.hcl.insecure.backup.\$(date +%Y%m%d_%H%M%S); \
|
||||
echo '部署安全配置...'; \
|
||||
echo '3131' | sudo -S cp /tmp/nomad-secure.hcl /etc/nomad.d/nomad.hcl; \
|
||||
echo '清理 Raft 数据以重新加入集群...'; \
|
||||
echo '3131' | sudo -S rm -rf /opt/nomad/data/server/raft/; \
|
||||
echo '启动服务...'; \
|
||||
echo '3131' | sudo -S systemctl start nomad; \
|
||||
sleep 10; \
|
||||
echo '检查服务状态...'; \
|
||||
echo '3131' | sudo -S systemctl status nomad --no-pager; \
|
||||
echo '=== ${each.key} 安全配置部署完成 ==='" && echo " - ${each.key} 安全修复成功" || echo " - ${each.key} 安全修复失败"
|
||||
|
||||
echo "=== ${each.key} 安全修复完成!时间: $(date) ==="
|
||||
EOF
|
||||
}
|
||||
|
||||
triggers = {
|
||||
config_hash = local_file.secure_server_configs[each.key].content_md5
|
||||
deploy_time = timestamp()
|
||||
}
|
||||
}
|
||||
|
||||
output "security_fix_summary" {
|
||||
value = {
|
||||
fixed_servers = var.insecure_servers
|
||||
config_files = [for server in var.insecure_servers : "${server}-server-secure.hcl"]
|
||||
deploy_time = timestamp()
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
# Nomad 服务器节点安全配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ash1d"
|
||||
|
||||
# 安全绑定 - 只绑定到 Tailscale 接口
|
||||
bind_addr = "ash1d.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "ash1d.tailnet-68f9.ts.net"
|
||||
rpc = "ash1d.tailnet-68f9.ts.net"
|
||||
serf = "ash1d.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "ash1d.tailnet-68f9.ts.net:4646"
|
||||
rpc = "ash1d.tailnet-68f9.ts.net:4647"
|
||||
serf = "ash1d.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
|
||||
# 七仙女服务器发现配置
|
||||
server_join {
|
||||
retry_join = [
|
||||
"semaphore.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ash2e.tailnet-68f9.ts.net:4647",
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"onecloud1.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# 安全的 Consul 配置 - 指向本地客户端
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 安全的 Vault 配置 - 指向本地代理
|
||||
vault {
|
||||
enabled = false # 暂时禁用,等 Vault 集群部署完成
|
||||
}
|
||||
|
||||
# 遥测配置
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
# Nomad 服务器节点安全配置模板
|
||||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "ash2e"
|
||||
|
||||
# 安全绑定 - 只绑定到 Tailscale 接口
|
||||
bind_addr = "ash2e.tailnet-68f9.ts.net"
|
||||
|
||||
addresses {
|
||||
http = "ash2e.tailnet-68f9.ts.net"
|
||||
rpc = "ash2e.tailnet-68f9.ts.net"
|
||||
serf = "ash2e.tailnet-68f9.ts.net"
|
||||
}
|
||||
|
||||
advertise {
|
||||
http = "ash2e.tailnet-68f9.ts.net:4646"
|
||||
rpc = "ash2e.tailnet-68f9.ts.net:4647"
|
||||
serf = "ash2e.tailnet-68f9.ts.net:4648"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
|
||||
# 七仙女服务器发现配置
|
||||
server_join {
|
||||
retry_join = [
|
||||
"semaphore.tailnet-68f9.ts.net:4647",
|
||||
"ash1d.tailnet-68f9.ts.net:4647",
|
||||
"ash2e.tailnet-68f9.ts.net:4647",
|
||||
"ch2.tailnet-68f9.ts.net:4647",
|
||||
"ch3.tailnet-68f9.ts.net:4647",
|
||||
"onecloud1.tailnet-68f9.ts.net:4647",
|
||||
"de.tailnet-68f9.ts.net:4647"
|
||||
]
|
||||
}
|
||||
}
|
||||
|
||||
# 安全的 Consul 配置 - 指向本地客户端
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
auto_advertise = true
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
|
||||
# 安全的 Vault 配置 - 指向本地代理
|
||||
vault {
|
||||
enabled = false # 暂时禁用,等 Vault 集群部署完成
|
||||
}
|
||||
|
||||
# 遥测配置
|
||||
telemetry {
|
||||
collection_interval = "1s"
|
||||
disable_hostname = false
|
||||
prometheus_metrics = true
|
||||
publish_allocation_metrics = true
|
||||
publish_node_metrics = true
|
||||
}
|
||||
|
|
@ -0,0 +1,79 @@
|
|||
# OpenTofu 小王 - onecloud1 配置部署
|
||||
terraform {
|
||||
required_providers {
|
||||
null = {
|
||||
source = "registry.opentofu.org/hashicorp/null"
|
||||
version = "3.2.4"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 用小王的 local-exec 执行 SSH 命令
|
||||
resource "null_resource" "onecloud1_deploy_via_ssh" {
|
||||
provisioner "local-exec" {
|
||||
command = <<EOF
|
||||
echo "=== OpenTofu 小王开始部署 onecloud1 ==="
|
||||
echo "开始时间: $(date)"
|
||||
|
||||
echo "1. 部署 ash1d 服务器配置..."
|
||||
echo " - 测试连接 ash1d..."
|
||||
ping -c 1 ash1d.tailnet-68f9.ts.net || echo " - ash1d ping 失败"
|
||||
|
||||
echo " - 上传配置文件..."
|
||||
sshpass -p '3131' scp -o StrictHostKeyChecking=no -o ConnectTimeout=5 \
|
||||
../nomad-configs-tofu/ash1d-server.hcl \
|
||||
ben@ash1d.tailnet-68f9.ts.net:/tmp/nomad-new.hcl && echo " - 文件上传成功" || echo " - 文件上传失败"
|
||||
|
||||
echo " - 执行配置部署..."
|
||||
sshpass -p '3131' ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 \
|
||||
ben@ash1d.tailnet-68f9.ts.net \
|
||||
"echo '=== ash1d 配置部署开始 ==='; \
|
||||
echo '3131' | sudo -S systemctl stop nomad; \
|
||||
echo '清理 Raft 数据...'; \
|
||||
echo '3131' | sudo -S rm -rf /opt/nomad/data/server/raft/; \
|
||||
echo '替换配置文件...'; \
|
||||
echo '3131' | sudo -S cp /tmp/nomad-new.hcl /etc/nomad.d/nomad.hcl; \
|
||||
echo '启动服务...'; \
|
||||
echo '3131' | sudo -S systemctl start nomad; \
|
||||
sleep 5; \
|
||||
echo '检查服务状态...'; \
|
||||
echo '3131' | sudo -S systemctl status nomad --no-pager; \
|
||||
echo '=== ash1d 部署完成 ==='" && echo " - ash1d 部署成功" || echo " - ash1d 部署失败"
|
||||
|
||||
echo "2. 部署 onecloud1 服务器配置..."
|
||||
echo " - 测试连接 onecloud1..."
|
||||
ping -c 1 onecloud1.tailnet-68f9.ts.net || echo " - onecloud1 ping 失败"
|
||||
|
||||
echo " - 上传配置文件..."
|
||||
sshpass -p '3131' scp -o StrictHostKeyChecking=no -o ConnectTimeout=5 \
|
||||
../nomad-configs-tofu/onecloud1-server.hcl \
|
||||
ben@onecloud1.tailnet-68f9.ts.net:/tmp/nomad-new.hcl && echo " - 文件上传成功" || echo " - 文件上传失败"
|
||||
|
||||
echo " - 执行配置部署..."
|
||||
sshpass -p '3131' ssh -o StrictHostKeyChecking=no -o ConnectTimeout=5 \
|
||||
ben@onecloud1.tailnet-68f9.ts.net \
|
||||
"echo '=== onecloud1 配置部署开始 ==='; \
|
||||
echo '3131' | sudo -S systemctl stop nomad; \
|
||||
echo '清理 Raft 数据...'; \
|
||||
echo '3131' | sudo -S rm -rf /opt/nomad/data/server/raft/; \
|
||||
echo '替换配置文件...'; \
|
||||
echo '3131' | sudo -S cp /tmp/nomad-new.hcl /etc/nomad.d/nomad.hcl; \
|
||||
echo '启动服务...'; \
|
||||
echo '3131' | sudo -S systemctl start nomad; \
|
||||
sleep 5; \
|
||||
echo '检查服务状态...'; \
|
||||
echo '3131' | sudo -S systemctl status nomad --no-pager; \
|
||||
echo '=== onecloud1 部署完成 ==='" && echo " - onecloud1 部署成功" || echo " - onecloud1 部署失败"
|
||||
|
||||
echo "=== 小王部署完成!时间: $(date) ==="
|
||||
EOF
|
||||
}
|
||||
|
||||
triggers = {
|
||||
deploy_time = timestamp()
|
||||
}
|
||||
}
|
||||
|
||||
output "deploy_result" {
|
||||
value = "OpenTofu 小王已完成 onecloud1 部署"
|
||||
}
|
||||
|
|
@ -0,0 +1,142 @@
|
|||
# HCP 技术栈可观察性架构规划
|
||||
|
||||
## 📋 当前状态总结
|
||||
|
||||
### ✅ 已完成的工作
|
||||
1. **ash2e 实例重建** - Ubuntu 24.04 LTS,IPv6 支持
|
||||
2. **HashiCorp 工具安装** - Consul, Nomad, Vault 原生客户端
|
||||
3. **现代化工具配置** - zsh + oh-my-zsh, tree, htop, neofetch
|
||||
4. **Tailscale 网络** - 已加入网络,hostname 正确
|
||||
5. **SSH 配置** - ben 用户无密码登录
|
||||
6. **Ansible 配置** - inventory 已更新
|
||||
|
||||
### ❌ 当前问题
|
||||
1. **监控架构混乱** - InfluxDB + Telegraf + Grafana 混合架构
|
||||
2. **Nomad 驱动问题** - Podman 驱动未正确配置
|
||||
3. **服务部署失败** - 监控栈无法部署到 Nomad
|
||||
|
||||
## 🎯 目标架构
|
||||
|
||||
### 监控技术栈
|
||||
```
|
||||
┌─────────────────┬─────────────────┬─────────────────┐
|
||||
│ 数据收集层 │ 数据存储层 │ 可视化层 │
|
||||
├─────────────────┼─────────────────┼─────────────────┤
|
||||
│ node_exporter │ Prometheus │ Grafana │
|
||||
│ consul_exporter │ InfluxDB 2.x │ (保留现有) │
|
||||
│ nomad_exporter │ (保留现有) │ │
|
||||
│ vault_exporter │ │ │
|
||||
└─────────────────┴─────────────────┴─────────────────┘
|
||||
```
|
||||
|
||||
### 部署策略
|
||||
- **容器化部署** - 所有监控服务通过 Podman 容器化
|
||||
- **Nomad 管理** - 使用 Nomad 作业管理所有监控服务
|
||||
- **PVE 高可用** - 利用 Proxmox 的 HA 特性,避免软件层复杂化
|
||||
|
||||
## 📝 下一步行动计划
|
||||
|
||||
### 优先级 1: 修复 Nomad Podman 驱动
|
||||
```bash
|
||||
# 检查所有节点的 Podman 驱动配置
|
||||
nomad node status -self
|
||||
nomad node config -self
|
||||
|
||||
# 确保 Podman socket 正确配置
|
||||
systemctl status podman.socket
|
||||
ls -la /run/podman/podman.sock
|
||||
```
|
||||
|
||||
### 优先级 2: 部署监控栈
|
||||
```bash
|
||||
# 部署 Grafana + Prometheus + Node Exporter
|
||||
nomad job run monitoring-stack.nomad
|
||||
|
||||
# 验证服务状态
|
||||
nomad job status monitoring-stack
|
||||
nomad service list
|
||||
```
|
||||
|
||||
### 优先级 3: 配置数据源集成
|
||||
```yaml
|
||||
# Grafana 数据源配置
|
||||
datasources:
|
||||
- Prometheus (主要)
|
||||
- InfluxDB (保留现有数据)
|
||||
```
|
||||
|
||||
### 优先级 4: 扩展监控指标
|
||||
```yaml
|
||||
# 监控目标
|
||||
- Consul 集群健康
|
||||
- Nomad 作业状态
|
||||
- Vault 密钥轮换
|
||||
- Traefik 路由指标
|
||||
- 系统资源使用
|
||||
```
|
||||
|
||||
## 🔧 技术细节
|
||||
|
||||
### Nomad 作业配置
|
||||
- **驱动**: `podman` (替代 `exec`)
|
||||
- **网络**: Tailscale 网络集成
|
||||
- **存储**: Host volumes 持久化
|
||||
- **服务发现**: Consul 自动注册
|
||||
|
||||
### 监控指标
|
||||
- **系统指标**: CPU, 内存, 磁盘, 网络
|
||||
- **服务指标**: Consul, Nomad, Vault 健康状态
|
||||
- **业务指标**: 应用性能, 错误率, 响应时间
|
||||
|
||||
### 告警规则
|
||||
- **系统告警**: 资源使用率 > 80%
|
||||
- **服务告警**: 服务健康检查失败
|
||||
- **业务告警**: 关键指标异常
|
||||
|
||||
## 📊 预期成果
|
||||
|
||||
### 短期目标 (1-2 周)
|
||||
- ✅ 统一监控架构
|
||||
- ✅ 所有服务容器化
|
||||
- ✅ 基础监控仪表板
|
||||
|
||||
### 中期目标 (1 个月)
|
||||
- ✅ 完整监控覆盖
|
||||
- ✅ 告警规则配置
|
||||
- ✅ 性能优化
|
||||
|
||||
### 长期目标 (3 个月)
|
||||
- ✅ 自动化运维
|
||||
- ✅ 预测性监控
|
||||
- ✅ 成本优化
|
||||
|
||||
## 🚨 风险与挑战
|
||||
|
||||
### 技术风险
|
||||
- **数据迁移** - InfluxDB 现有数据保留
|
||||
- **服务中断** - 监控服务切换期间
|
||||
- **性能影响** - 监控服务资源消耗
|
||||
|
||||
### 解决方案
|
||||
- **渐进式迁移** - 逐步替换现有监控
|
||||
- **备份策略** - 关键数据备份
|
||||
- **资源监控** - 监控服务自身监控
|
||||
|
||||
## 📚 相关文档
|
||||
|
||||
### 配置文件
|
||||
- `monitoring-stack.nomad` - 监控栈 Nomad 作业
|
||||
- `prometheus.yml` - Prometheus 配置
|
||||
- `grafana-datasources.yml` - Grafana 数据源
|
||||
|
||||
### 参考资源
|
||||
- [Prometheus 官方文档](https://prometheus.io/docs/)
|
||||
- [Grafana 官方文档](https://grafana.com/docs/)
|
||||
- [Nomad Podman 驱动](https://developer.hashicorp.com/nomad/docs/drivers/podman)
|
||||
|
||||
---
|
||||
|
||||
**移交时间**: 2025-10-10 02:40 UTC
|
||||
**当前状态**: 监控栈部署遇到 Podman 驱动问题
|
||||
**下一步**: 修复 Nomad Podman 驱动配置
|
||||
**负责人**: Next Session
|
||||
|
|
@ -0,0 +1,101 @@
|
|||
# 当前会话工作总结
|
||||
|
||||
## 🎯 主要成就
|
||||
|
||||
### ✅ ash2e 实例重建
|
||||
- **操作系统**: Ubuntu 24.04 LTS
|
||||
- **IPv6 支持**: 自动分配,与现有实例同子网
|
||||
- **SSH 配置**: ben 用户无密码登录
|
||||
- **现代化工具**: zsh + oh-my-zsh, tree, htop, neofetch
|
||||
- **HashiCorp 工具**: Consul, Nomad, Vault 原生客户端
|
||||
|
||||
### ✅ 系统优化
|
||||
- **内核更新**: 从 6.14.0-1012 升级到 6.14.0-1013
|
||||
- **系统重启**: 应用内核更新,确保系统一致性
|
||||
- **Tailscale 网络**: 已加入网络,hostname 正确
|
||||
|
||||
### ✅ 监控架构规划
|
||||
- **技术栈选择**: Prometheus + Grafana + Node Exporter
|
||||
- **部署策略**: 容器化 + Nomad 管理
|
||||
- **高可用方案**: 利用 PVE 硬件层 HA
|
||||
|
||||
## ❌ 当前阻塞问题
|
||||
|
||||
### Nomad Podman 驱动问题
|
||||
```bash
|
||||
# 错误信息
|
||||
Constraint "missing drivers": 6 nodes excluded by filter
|
||||
```
|
||||
|
||||
**问题分析**:
|
||||
- Nomad 无法识别 Podman 驱动
|
||||
- 需要检查所有节点的 Podman 配置
|
||||
- 可能需要重新配置 Nomad 客户端
|
||||
|
||||
## 📋 待完成任务
|
||||
|
||||
### 优先级 1: 修复 Nomad 驱动
|
||||
- [ ] 检查所有节点的 Podman 驱动配置
|
||||
- [ ] 验证 Podman socket 状态
|
||||
- [ ] 重新配置 Nomad 客户端
|
||||
|
||||
### 优先级 2: 部署监控栈
|
||||
- [ ] 部署 Grafana + Prometheus + Node Exporter
|
||||
- [ ] 配置数据源集成
|
||||
- [ ] 验证服务状态
|
||||
|
||||
### 优先级 3: 监控扩展
|
||||
- [ ] 添加 Consul/Nomad/Vault 监控
|
||||
- [ ] 配置告警规则
|
||||
- [ ] 创建监控仪表板
|
||||
|
||||
## 🔧 技术债务
|
||||
|
||||
### 配置问题
|
||||
- **InfluxDB 架构**: 当前单点部署,需要容器化
|
||||
- **监控混乱**: Telegraf + InfluxDB + Grafana 混合架构
|
||||
- **驱动配置**: Nomad Podman 驱动未正确配置
|
||||
|
||||
### 架构改进
|
||||
- **统一部署**: 所有服务通过 Nomad 管理
|
||||
- **容器化**: 使用 Podman 替代直接安装
|
||||
- **标准化**: 统一监控指标和告警
|
||||
|
||||
## 📊 性能指标
|
||||
|
||||
### 系统状态
|
||||
- **ash2e 实例**: ✅ 运行正常
|
||||
- **内存使用**: 370MB/956MB (38%)
|
||||
- **磁盘使用**: 8.9GB/20GB (48%)
|
||||
- **网络连接**: ✅ Tailscale 正常
|
||||
|
||||
### 服务状态
|
||||
- **Consul**: ✅ 集群健康
|
||||
- **Nomad**: ✅ 节点就绪
|
||||
- **Vault**: ✅ 服务正常
|
||||
- **InfluxDB**: ✅ 运行稳定
|
||||
|
||||
## 🚀 下一步建议
|
||||
|
||||
### 立即行动
|
||||
1. **修复 Podman 驱动** - 检查所有节点配置
|
||||
2. **重新部署监控栈** - 使用修复后的配置
|
||||
3. **验证服务状态** - 确保所有服务正常运行
|
||||
|
||||
### 中期规划
|
||||
1. **监控扩展** - 添加更多监控指标
|
||||
2. **告警配置** - 设置关键指标告警
|
||||
3. **仪表板优化** - 创建业务监控面板
|
||||
|
||||
### 长期目标
|
||||
1. **自动化运维** - 基于监控的自动响应
|
||||
2. **性能优化** - 基于数据的系统优化
|
||||
3. **成本控制** - 资源使用优化
|
||||
|
||||
---
|
||||
|
||||
**会话结束时间**: 2025-10-10 02:40 UTC
|
||||
**总工作时长**: 约 2 小时
|
||||
**主要成果**: ash2e 实例重建 + 监控架构规划
|
||||
**阻塞问题**: Nomad Podman 驱动配置
|
||||
**移交状态**: 准备就绪,等待下一会话继续
|
||||
|
|
@ -0,0 +1,56 @@
|
|||
global:
|
||||
scrape_interval: 15s
|
||||
evaluation_interval: 15s
|
||||
|
||||
rule_files:
|
||||
# - "first_rules.yml"
|
||||
# - "second_rules.yml"
|
||||
|
||||
scrape_configs:
|
||||
# Prometheus 自身监控
|
||||
- job_name: 'prometheus'
|
||||
static_configs:
|
||||
- targets: ['localhost:9090']
|
||||
|
||||
# Node Exporter 监控
|
||||
- job_name: 'node-exporter'
|
||||
static_configs:
|
||||
- targets: ['node-exporter.tailnet-68f9.ts.net:9100']
|
||||
|
||||
# Consul 监控
|
||||
- job_name: 'consul'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'ch4.tailnet-68f9.ts.net:8500'
|
||||
- 'ash3c.tailnet-68f9.ts.net:8500'
|
||||
- 'warden.tailnet-68f9.ts.net:8500'
|
||||
|
||||
# Nomad 监控
|
||||
- job_name: 'nomad'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'semaphore.tailnet-68f9.ts.net:4646'
|
||||
- 'ash1d.tailnet-68f9.ts.net:4646'
|
||||
- 'ash2e.tailnet-68f9.ts.net:4646'
|
||||
- 'ch2.tailnet-68f9.ts.net:4646'
|
||||
- 'ch3.tailnet-68f9.ts.net:4646'
|
||||
- 'onecloud1.tailnet-68f9.ts.net:4646'
|
||||
- 'de.tailnet-68f9.ts.net:4646'
|
||||
|
||||
# Vault 监控
|
||||
- job_name: 'vault'
|
||||
static_configs:
|
||||
- targets:
|
||||
- 'master.tailnet-68f9.ts.net:8200'
|
||||
- 'ash3c.tailnet-68f9.ts.net:8200'
|
||||
- 'hcp1.tailnet-68f9.ts.net:8200'
|
||||
|
||||
# InfluxDB 监控
|
||||
- job_name: 'influxdb'
|
||||
static_configs:
|
||||
- targets: ['influxdb1.tailnet-68f9.ts.net:8086']
|
||||
|
||||
# Traefik 监控
|
||||
- job_name: 'traefik'
|
||||
static_configs:
|
||||
- targets: ['hcp1.tailnet-68f9.ts.net:8080']
|
||||
|
|
@ -1,69 +0,0 @@
|
|||
---
|
||||
all:
|
||||
children:
|
||||
pve_cluster:
|
||||
hosts:
|
||||
nuc12:
|
||||
ansible_host: nuc12
|
||||
ansible_user: root
|
||||
ansible_ssh_pass: "Aa313131@ben"
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
|
||||
xgp:
|
||||
ansible_host: xgp
|
||||
ansible_user: root
|
||||
ansible_ssh_pass: "Aa313131@ben"
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
|
||||
pve:
|
||||
ansible_host: pve
|
||||
ansible_user: root
|
||||
ansible_ssh_pass: "Aa313131@ben"
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'
|
||||
vars:
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
|
||||
nomad_cluster:
|
||||
hosts:
|
||||
ch4:
|
||||
ansible_host: ch4.tailnet-68f9.ts.net
|
||||
ansible_user: root
|
||||
ansible_ssh_private_key_file: ~/.ssh/id_ed25519
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
|
||||
hcp1:
|
||||
ansible_host: hcp1.tailnet-68f9.ts.net
|
||||
ansible_user: root
|
||||
ansible_ssh_private_key_file: ~/.ssh/id_ed25519
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
|
||||
ash3c:
|
||||
ansible_host: ash3c.tailnet-68f9.ts.net
|
||||
ansible_user: root
|
||||
ansible_ssh_private_key_file: ~/.ssh/id_ed25519
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
|
||||
warden:
|
||||
ansible_host: warden.tailnet-68f9.ts.net
|
||||
ansible_user: ben
|
||||
ansible_ssh_pass: "3131"
|
||||
ansible_become_pass: "3131"
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
|
||||
onecloud1:
|
||||
ansible_host: onecloud1.tailnet-68f9.ts.net
|
||||
ansible_user: root
|
||||
ansible_ssh_private_key_file: ~/.ssh/id_ed25519
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
|
||||
influxdb1:
|
||||
ansible_host: influxdb1.tailnet-68f9.ts.net
|
||||
ansible_user: root
|
||||
ansible_ssh_private_key_file: ~/.ssh/id_ed25519
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
|
||||
browser:
|
||||
ansible_host: browser.tailnet-68f9.ts.net
|
||||
ansible_user: root
|
||||
ansible_ssh_private_key_file: ~/.ssh/id_ed25519
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
|
||||
ash1d:
|
||||
ansible_host: ash1d.tailnet-68f9.ts.net
|
||||
ansible_user: ben
|
||||
ansible_ssh_pass: "3131"
|
||||
ansible_become_pass: "3131"
|
||||
ansible_ssh_common_args: '-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
|
||||
vars:
|
||||
ansible_python_interpreter: /usr/bin/python3
|
||||
|
|
@ -0,0 +1,48 @@
|
|||
---
|
||||
# Ansible 探马 - 检查所有客户端节点的基础环境
|
||||
- name: 侦察客户端节点基础环境
|
||||
hosts: all
|
||||
gather_facts: yes
|
||||
tasks:
|
||||
- name: 收集系统架构信息
|
||||
debug:
|
||||
msg: "节点 {{ inventory_hostname }} - 架构: {{ ansible_architecture }} - 系统: {{ ansible_distribution }} {{ ansible_distribution_version }}"
|
||||
|
||||
- name: 检查 HashiCorp 软件包安装状态
|
||||
shell: |
|
||||
echo "=== HashiCorp 软件包检查 ==="
|
||||
echo "Nomad: $(nomad version 2>/dev/null || echo '未安装')"
|
||||
echo "Consul: $(consul version 2>/dev/null || echo '未安装')"
|
||||
echo "Vault: $(vault version 2>/dev/null || echo '未安装')"
|
||||
register: hashicorp_status
|
||||
|
||||
- name: 检查 HashiCorp 软件源配置
|
||||
shell: |
|
||||
echo "=== 软件源配置检查 ==="
|
||||
if [ -f /etc/apt/sources.list.d/hashicorp.list ]; then
|
||||
echo "HashiCorp 源文件存在:"
|
||||
cat /etc/apt/sources.list.d/hashicorp.list
|
||||
else
|
||||
echo "HashiCorp 源文件不存在"
|
||||
fi
|
||||
register: sources_status
|
||||
|
||||
- name: 检查系统服务状态
|
||||
shell: |
|
||||
echo "=== 系统服务状态 ==="
|
||||
echo "Nomad: $(systemctl is-active nomad 2>/dev/null || echo '未配置')"
|
||||
echo "Consul: $(systemctl is-active consul 2>/dev/null || echo '未配置')"
|
||||
echo "Podman: $(systemctl is-active podman 2>/dev/null || echo '未配置')"
|
||||
register: services_status
|
||||
|
||||
- name: 显示侦察结果
|
||||
debug:
|
||||
msg: |
|
||||
==========================================
|
||||
节点: {{ inventory_hostname }}
|
||||
架构: {{ ansible_architecture }}
|
||||
==========================================
|
||||
{{ hashicorp_status.stdout }}
|
||||
{{ sources_status.stdout }}
|
||||
{{ services_status.stdout }}
|
||||
==========================================
|
||||
|
|
@ -0,0 +1,170 @@
|
|||
#!/bin/bash
|
||||
|
||||
# HCP 集群先决条件检查脚本
|
||||
# 检查所有客户端节点的 HashiCorp 软件包安装状态
|
||||
|
||||
set -e
|
||||
|
||||
# 客户端节点列表
|
||||
CLIENT_NODES=(
|
||||
"ash2e.tailnet-68f9.ts.net"
|
||||
"ash1d.tailnet-68f9.ts.net"
|
||||
"hcp1.tailnet-68f9.ts.net"
|
||||
"influxdb.tailnet-68f9.ts.net"
|
||||
"ash3c.tailnet-68f9.ts.net"
|
||||
"ch4.tailnet-68f9.ts.net"
|
||||
"warden.tailnet-68f9.ts.net"
|
||||
"browser.tailnet-68f9.ts.net"
|
||||
)
|
||||
|
||||
SSH_OPTS="-o StrictHostKeyChecking=no -o ConnectTimeout=5"
|
||||
PASSWORD="3131"
|
||||
|
||||
echo "=== HCP 集群先决条件检查开始 ==="
|
||||
echo "检查时间: $(date)"
|
||||
echo
|
||||
|
||||
# 检查函数
|
||||
check_node_prerequisites() {
|
||||
local node=$1
|
||||
echo "检查节点: $node"
|
||||
|
||||
# 检查网络连通性
|
||||
if ! ping -c 1 -W 2 "$node" >/dev/null 2>&1; then
|
||||
echo " ❌ 网络不通"
|
||||
return 1
|
||||
fi
|
||||
|
||||
# 检查 SSH 连接
|
||||
if ! sshpass -p "$PASSWORD" ssh $SSH_OPTS ben@"$node" "echo 'SSH OK'" >/dev/null 2>&1; then
|
||||
echo " ❌ SSH 连接失败"
|
||||
return 1
|
||||
fi
|
||||
|
||||
echo " ✅ 网络和 SSH 连接正常"
|
||||
|
||||
# 检查 HashiCorp 软件源配置
|
||||
echo " 检查 HashiCorp 软件源..."
|
||||
sshpass -p "$PASSWORD" ssh $SSH_OPTS ben@"$node" "
|
||||
if [ -f /etc/apt/sources.list.d/hashicorp.list ]; then
|
||||
echo ' ✅ HashiCorp 软件源文件存在'
|
||||
if grep -q 'trusted=yes' /etc/apt/sources.list.d/hashicorp.list; then
|
||||
echo ' ✅ 已配置 trusted=yes'
|
||||
else
|
||||
echo ' ⚠️ 未配置 trusted=yes'
|
||||
fi
|
||||
cat /etc/apt/sources.list.d/hashicorp.list | sed 's/^/ /'
|
||||
else
|
||||
echo ' ❌ HashiCorp 软件源文件不存在'
|
||||
fi
|
||||
"
|
||||
|
||||
# 检查二进制文件安装
|
||||
echo " 检查 HashiCorp 二进制文件..."
|
||||
sshpass -p "$PASSWORD" ssh $SSH_OPTS ben@"$node" "
|
||||
for binary in nomad consul vault; do
|
||||
if command -v \$binary >/dev/null 2>&1; then
|
||||
version=\$(\$binary version | head -n1)
|
||||
echo \" ✅ \$binary: \$version\"
|
||||
else
|
||||
echo \" ❌ \$binary: 未安装\"
|
||||
fi
|
||||
done
|
||||
"
|
||||
|
||||
# 检查系统服务状态
|
||||
echo " 检查系统服务状态..."
|
||||
sshpass -p "$PASSWORD" ssh $SSH_OPTS ben@"$node" "
|
||||
for service in nomad consul; do
|
||||
if systemctl is-enabled \$service >/dev/null 2>&1; then
|
||||
status=\$(systemctl is-active \$service)
|
||||
echo \" \$service: \$status\"
|
||||
else
|
||||
echo \" \$service: 未配置\"
|
||||
fi
|
||||
done
|
||||
"
|
||||
|
||||
echo
|
||||
}
|
||||
|
||||
# 修复软件源配置的函数
|
||||
fix_hashicorp_sources() {
|
||||
local node=$1
|
||||
echo "修复节点 $node 的 HashiCorp 软件源配置..."
|
||||
|
||||
sshpass -p "$PASSWORD" ssh $SSH_OPTS ben@"$node" "
|
||||
echo '修复 HashiCorp 软件源配置...'
|
||||
|
||||
# 备份现有配置
|
||||
if [ -f /etc/apt/sources.list.d/hashicorp.list ]; then
|
||||
echo '$PASSWORD' | sudo -S cp /etc/apt/sources.list.d/hashicorp.list /etc/apt/sources.list.d/hashicorp.list.bak
|
||||
fi
|
||||
|
||||
# 创建新的软件源配置 (trusted=yes)
|
||||
echo '$PASSWORD' | sudo -S tee /etc/apt/sources.list.d/hashicorp.list > /dev/null << 'EOF'
|
||||
deb [arch=amd64 trusted=yes] https://apt.releases.hashicorp.com jammy main
|
||||
EOF
|
||||
|
||||
# 更新软件包列表
|
||||
echo '$PASSWORD' | sudo -S apt update
|
||||
|
||||
echo '✅ HashiCorp 软件源配置已修复'
|
||||
"
|
||||
}
|
||||
|
||||
# 安装缺失软件包的函数
|
||||
install_missing_packages() {
|
||||
local node=$1
|
||||
echo "在节点 $node 上安装 HashiCorp 软件包..."
|
||||
|
||||
sshpass -p "$PASSWORD" ssh $SSH_OPTS ben@"$node" "
|
||||
echo '安装 HashiCorp 软件包...'
|
||||
echo '$PASSWORD' | sudo -S apt install -y nomad consul vault
|
||||
echo '✅ HashiCorp 软件包安装完成'
|
||||
"
|
||||
}
|
||||
|
||||
# 主检查流程
|
||||
main() {
|
||||
local failed_nodes=()
|
||||
local needs_source_fix=()
|
||||
local needs_package_install=()
|
||||
|
||||
# 第一轮:检查所有节点
|
||||
for node in "${CLIENT_NODES[@]}"; do
|
||||
if ! check_node_prerequisites "$node"; then
|
||||
failed_nodes+=("$node")
|
||||
fi
|
||||
done
|
||||
|
||||
# 汇总报告
|
||||
echo "=== 检查结果汇总 ==="
|
||||
if [ ${#failed_nodes[@]} -eq 0 ]; then
|
||||
echo "✅ 所有节点先决条件检查通过"
|
||||
else
|
||||
echo "⚠️ 以下节点需要修复:"
|
||||
for node in "${failed_nodes[@]}"; do
|
||||
echo " - $node"
|
||||
done
|
||||
|
||||
echo
|
||||
echo "是否要自动修复这些节点? (y/N)"
|
||||
read -r response
|
||||
if [[ "$response" =~ ^[Yy]$ ]]; then
|
||||
for node in "${failed_nodes[@]}"; do
|
||||
echo "修复节点: $node"
|
||||
fix_hashicorp_sources "$node"
|
||||
install_missing_packages "$node"
|
||||
echo
|
||||
done
|
||||
|
||||
echo "=== 重新检查修复后的节点 ==="
|
||||
for node in "${failed_nodes[@]}"; do
|
||||
check_node_prerequisites "$node"
|
||||
done
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
main "$@"
|
||||
|
|
@ -0,0 +1,45 @@
|
|||
# 测试 OpenTofu 本机功能
|
||||
terraform {
|
||||
required_providers {
|
||||
null = {
|
||||
source = "registry.opentofu.org/hashicorp/null"
|
||||
version = "3.2.4"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 本机测试 - 创建文件
|
||||
resource "null_resource" "local_test" {
|
||||
provisioner "local-exec" {
|
||||
command = <<EOF
|
||||
echo "=== 基线测试:模拟完整的配置替换流程 ==="
|
||||
echo "时间: $(date)"
|
||||
|
||||
echo "1. 模拟上传配置文件..."
|
||||
cp ../nomad-configs-tofu/onecloud1-server.hcl /tmp/nomad-test.hcl
|
||||
echo " 配置文件已复制到 /tmp/nomad-test.hcl"
|
||||
|
||||
echo "2. 模拟替换运行配置..."
|
||||
sudo mkdir -p /tmp/test-nomad.d
|
||||
sudo cp /tmp/nomad-test.hcl /tmp/test-nomad.d/nomad.hcl
|
||||
echo " 配置已替换到 /tmp/test-nomad.d/nomad.hcl"
|
||||
|
||||
echo "3. 模拟服务重启..."
|
||||
sleep 2
|
||||
echo " 服务重启完成 (模拟)"
|
||||
|
||||
echo "4. 验证配置..."
|
||||
ls -la /tmp/test-nomad.d/
|
||||
echo "=== 基线测试完成 - 总耗时应该在2-3秒 ==="
|
||||
echo "完成时间: $(date)"
|
||||
EOF
|
||||
}
|
||||
|
||||
triggers = {
|
||||
test_run = timestamp()
|
||||
}
|
||||
}
|
||||
|
||||
output "test_result" {
|
||||
value = "本机测试完成"
|
||||
}
|
||||
|
|
@ -1,109 +0,0 @@
|
|||
# OpenTofu配置 - 使用Terraform的插件缓存
|
||||
terraform {
|
||||
required_providers {
|
||||
oci = {
|
||||
source = "oracle/oci"
|
||||
version = "7.20.0" # 使用已有的版本
|
||||
}
|
||||
consul = {
|
||||
source = "hashicorp/consul"
|
||||
version = "2.22.0" # 使用已有的版本
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 从Consul获取韩国区域配置
|
||||
data "consul_keys" "oracle_config_kr" {
|
||||
key {
|
||||
name = "tenancy_ocid"
|
||||
path = "config/dev/oracle/kr/tenancy_ocid"
|
||||
}
|
||||
key {
|
||||
name = "user_ocid"
|
||||
path = "config/dev/oracle/kr/user_ocid"
|
||||
}
|
||||
key {
|
||||
name = "fingerprint"
|
||||
path = "config/dev/oracle/kr/fingerprint"
|
||||
}
|
||||
key {
|
||||
name = "private_key_path"
|
||||
path = "config/dev/oracle/kr/private_key_path"
|
||||
}
|
||||
key {
|
||||
name = "region"
|
||||
path = "config/dev/oracle/kr/region"
|
||||
}
|
||||
}
|
||||
|
||||
# 从Consul获取美国区域配置
|
||||
data "consul_keys" "oracle_config_us" {
|
||||
key {
|
||||
name = "tenancy_ocid"
|
||||
path = "config/dev/oracle/us/tenancy_ocid"
|
||||
}
|
||||
key {
|
||||
name = "user_ocid"
|
||||
path = "config/dev/oracle/us/user_ocid"
|
||||
}
|
||||
key {
|
||||
name = "fingerprint"
|
||||
path = "config/dev/oracle/us/fingerprint"
|
||||
}
|
||||
key {
|
||||
name = "private_key_path"
|
||||
path = "config/dev/oracle/us/private_key_path"
|
||||
}
|
||||
key {
|
||||
name = "region"
|
||||
path = "config/dev/oracle/us/region"
|
||||
}
|
||||
}
|
||||
|
||||
# 韩国区域Provider
|
||||
provider "oci" {
|
||||
tenancy_ocid = data.consul_keys.oracle_config_kr.var.tenancy_ocid
|
||||
user_ocid = data.consul_keys.oracle_config_kr.var.user_ocid
|
||||
fingerprint = data.consul_keys.oracle_config_kr.var.fingerprint
|
||||
private_key_path = data.consul_keys.oracle_config_kr.var.private_key_path
|
||||
region = data.consul_keys.oracle_config_kr.var.region
|
||||
}
|
||||
|
||||
# 美国区域Provider
|
||||
provider "oci" {
|
||||
alias = "us"
|
||||
tenancy_ocid = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
user_ocid = data.consul_keys.oracle_config_us.var.user_ocid
|
||||
fingerprint = data.consul_keys.oracle_config_us.var.fingerprint
|
||||
private_key_path = data.consul_keys.oracle_config_us.var.private_key_path
|
||||
region = data.consul_keys.oracle_config_us.var.region
|
||||
}
|
||||
|
||||
# 测试韩国区域连接
|
||||
data "oci_identity_compartments" "kr_compartments" {
|
||||
compartment_id = data.consul_keys.oracle_config_kr.var.tenancy_ocid
|
||||
}
|
||||
|
||||
# 测试美国区域连接
|
||||
data "oci_identity_compartments" "us_compartments" {
|
||||
provider = oci.us
|
||||
compartment_id = data.consul_keys.oracle_config_us.var.tenancy_ocid
|
||||
}
|
||||
|
||||
# 输出Consul数据
|
||||
output "consul_kr_config" {
|
||||
value = data.consul_keys.oracle_config_kr.var
|
||||
}
|
||||
|
||||
output "consul_us_config" {
|
||||
value = data.consul_keys.oracle_config_us.var
|
||||
}
|
||||
|
||||
# 输出OCI数据
|
||||
output "kr_compartments" {
|
||||
value = data.oci_identity_compartments.kr_compartments.compartments
|
||||
}
|
||||
|
||||
output "us_compartments" {
|
||||
value = data.oci_identity_compartments.us_compartments.compartments
|
||||
}
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
#!/bin/bash
|
||||
|
||||
# tmux监控脚本
|
||||
echo "=== TMUX 会话状态 ==="
|
||||
tmux list-sessions
|
||||
|
||||
echo -e "\n=== TMUX 窗口状态 ==="
|
||||
tmux list-windows -a
|
||||
|
||||
echo -e "\n=== 当前活跃窗口内容预览 ==="
|
||||
for session in $(tmux list-sessions -F "#{session_name}"); do
|
||||
echo "--- 会话: $session ---"
|
||||
for window in $(tmux list-windows -t "$session" -F "#{window_index}"); do
|
||||
window_name=$(tmux list-windows -t "$session" -F "#{window_name}" | sed -n "${window}p")
|
||||
echo "窗口 $window ($window_name):"
|
||||
tmux capture-pane -t "$session:$window" -p | tail -3
|
||||
echo ""
|
||||
done
|
||||
done
|
||||
|
|
@ -0,0 +1,34 @@
|
|||
#!/bin/bash
|
||||
|
||||
# 简单的 webhook 服务来触发 Nomad 部署
|
||||
# 一镜到底解决流程问题
|
||||
|
||||
echo "🚀 启动 Nomad 部署 webhook 服务..."
|
||||
|
||||
# 监听 webhook 请求
|
||||
while true; do
|
||||
echo "📡 等待 webhook 请求..."
|
||||
|
||||
# 模拟 webhook 触发
|
||||
if [ -f "/tmp/trigger_deploy" ]; then
|
||||
echo "🎯 检测到部署触发信号!"
|
||||
rm -f /tmp/trigger_deploy
|
||||
|
||||
echo "🔄 开始部署 Nomad 配置..."
|
||||
cd /root/mgmt
|
||||
|
||||
# 部署所有 Nomad 节点配置
|
||||
ansible-playbook -i deployment/ansible/inventories/production/inventory.ini \
|
||||
deployment/ansible/playbooks/deploy-nomad-config.yml \
|
||||
--ask-become-pass <<< "3131"
|
||||
|
||||
echo "✅ Nomad 部署完成!"
|
||||
|
||||
# 检查部署结果
|
||||
nomad node status
|
||||
|
||||
echo "🎉 一镜到底完成!"
|
||||
fi
|
||||
|
||||
sleep 5
|
||||
done
|
||||
Loading…
Reference in New Issue