mgmt/nomad_expired_nodes_handlin...

54 lines
1.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Nomad过期客户端节点处理总结
## 任务目标
移除Nomad集群中三个已过期的客户端节点
1. bj-semaphore (ID: fa91f05f)
2. kr-ch2 (ID: 369f60be)
3. kr-ch3 (ID: 3bd9e893)
## 已完成操作
### 1. 标记节点为不可调度
```
nomad node eligibility -address=http://100.86.141.112:4646 -disable fa91f05f
nomad node eligibility -address=http://100.86.141.112:4646 -disable 369f60be
nomad node eligibility -address=http://100.86.141.112:4646 -disable 3bd9e893
```
### 2. 强制排水操作
```
nomad node drain -address=http://100.86.141.112:4646 -enable -force fa91f05f
nomad node drain -address=http://100.86.141.112:4646 -enable -force 369f60be
nomad node drain -address=http://100.86.141.112:4646 -enable -force 3bd9e893
```
### 3. API删除尝试
```
curl -X DELETE http://100.86.141.112:4646/v1/node/fa91f05f-80d7-1b10-a879-a54ba2fb943f
curl -X DELETE http://100.86.141.112:4646/v1/node/369f60be-2640-93f2-94f5-fe95907d0462
curl -X DELETE http://100.86.141.112:4646/v1/node/3bd9e893-aef4-b732-6c07-63739601ccde
```
### 4. 服务器节点重启
- 重启了 ash1d.global.global 节点
- 重启了 ch2.global.global 节点
- 集群保持稳定运行
### 5. 配置管理更新
- 从Ansible inventory文件中注释掉了过期节点
- ch2 (kr-ch2)
- ch3 (kr-ch3)
- semaphoressh (bj-semaphore)
## 当前状态
节点仍然显示在Nomad集群节点列表中但已被标记为不可调度且已完成排水不会对集群造成影响。
## 后续建议
1. 等待Nomad自动清理默认72小时后
2. 监控集群状态确保正常运行
3. 如有需要,可考虑更激进的手动清理方法
## 相关文档
- 详细操作报告: nomad_expired_nodes_final_report.md
- 重启备份计划: nomad_restart_backup_plan.md
- 移除操作报告: nomad_expired_nodes_removal_report.md