feat: 重构项目目录结构并添加多个功能
- 新增脚本和配置文件用于管理Nomad节点和NFS存储 - 添加多个Ansible playbook用于配置和调试Nomad集群 - 新增Nomad job文件用于测试Podman和NFS功能 - 重构playbooks目录结构,按功能分类 - 更新Nomad客户端和服务端配置模板 - 添加SSH密钥分发和配置脚本 - 新增多个调试和修复问题的playbook
This commit is contained in:
146
docs/nomad-nfs-setup.md
Normal file
146
docs/nomad-nfs-setup.md
Normal file
@@ -0,0 +1,146 @@
|
||||
# Nomad集群NFS配置指南
|
||||
|
||||
## 概述
|
||||
|
||||
本文档介绍如何为Nomad集群配置NFS存储,支持不同类型的容器和地理位置。
|
||||
|
||||
## 容器类型分类
|
||||
|
||||
### 1. 本地LXC容器
|
||||
- **位置**: 本地网络环境
|
||||
- **节点示例**: influxdb, warden, hcp1, hcp2
|
||||
- **特点**: 直接使用已映射的NFS目录
|
||||
- **NFS参数**: `rw,sync,vers=4.2`
|
||||
|
||||
### 2. 海外PVE容器
|
||||
- **位置**: 海外云服务器
|
||||
- **节点示例**: ash1d, ash2e, ash3c, ch2, ch3
|
||||
- **特点**: 需要网络优化参数
|
||||
- **NFS参数**: `rw,sync,vers=3,timeo=600,retrans=2`
|
||||
|
||||
## NFS配置详情
|
||||
|
||||
### NFS服务器信息
|
||||
- **服务器**: snail
|
||||
- **导出路径**: `/fs/1000/nfs/Fnsync`
|
||||
- **挂载点**: `/mnt/fnsync`
|
||||
|
||||
### 当前挂载状态
|
||||
```bash
|
||||
# 检查当前挂载
|
||||
df -h | grep fnsync
|
||||
# 输出: snail:/fs/1000/nfs/Fnsync 8.2T 2.2T 6.0T 27% /mnt/fnsync
|
||||
```
|
||||
|
||||
## 部署步骤
|
||||
|
||||
### 1. 自动部署
|
||||
```bash
|
||||
chmod +x scripts/deploy-nfs-for-nomad.sh
|
||||
./scripts/deploy-nfs-for-nomad.sh
|
||||
```
|
||||
|
||||
### 2. 手动分步部署
|
||||
```bash
|
||||
# 步骤1: 配置NFS挂载
|
||||
ansible-playbook -i configuration/inventories/production/inventory.ini \
|
||||
playbooks/setup-nfs-by-container-type.yml
|
||||
|
||||
# 步骤2: 配置Nomad客户端
|
||||
ansible-playbook -i configuration/inventories/production/nomad-cluster.ini \
|
||||
playbooks/setup-nomad-nfs-client.yml
|
||||
```
|
||||
|
||||
## Nomad作业配置
|
||||
|
||||
### 使用NFS卷的Nomad作业示例
|
||||
|
||||
```hcl
|
||||
job "nfs-example" {
|
||||
volume "nfs-shared" {
|
||||
type = "host"
|
||||
source = "nfs-shared"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
task "app" {
|
||||
volume_mount {
|
||||
volume = "nfs-shared"
|
||||
destination = "/shared"
|
||||
read_only = false
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 针对不同容器类型的约束
|
||||
|
||||
```hcl
|
||||
# 本地LXC容器约束
|
||||
constraint {
|
||||
attribute = "${attr.unique.hostname}"
|
||||
operator = "regexp"
|
||||
value = "(influxdb|warden|hcp1|hcp2)"
|
||||
}
|
||||
|
||||
# 海外PVE容器约束
|
||||
constraint {
|
||||
attribute = "${attr.unique.hostname}"
|
||||
operator = "regexp"
|
||||
value = "(ash1d|ash2e|ash3c|ch2|ch3)"
|
||||
}
|
||||
```
|
||||
|
||||
## 验证和监控
|
||||
|
||||
### 验证命令
|
||||
```bash
|
||||
# 检查NFS挂载
|
||||
ansible all -i configuration/inventories/production/inventory.ini \
|
||||
-m shell -a "df -h /mnt/fnsync"
|
||||
|
||||
# 检查Nomad状态
|
||||
nomad node status
|
||||
|
||||
# 检查NFS任务状态
|
||||
nomad job status nfs-multi-type-example
|
||||
```
|
||||
|
||||
### 监控指标
|
||||
- NFS挂载状态
|
||||
- 网络延迟(海外节点)
|
||||
- 存储使用情况
|
||||
- Nomad任务运行状态
|
||||
|
||||
## 故障排除
|
||||
|
||||
### 常见问题
|
||||
|
||||
1. **NFS挂载失败**
|
||||
- 检查网络连通性: `ping snail`
|
||||
- 验证NFS服务: `showmount -e snail`
|
||||
- 检查防火墙设置
|
||||
|
||||
2. **海外节点连接慢**
|
||||
- 使用NFSv3协议
|
||||
- 增加超时参数
|
||||
- 考虑使用缓存方案
|
||||
|
||||
3. **Nomad卷无法挂载**
|
||||
- 检查Nomad客户端配置
|
||||
- 验证目录权限
|
||||
- 检查Nomad服务状态
|
||||
|
||||
## 最佳实践
|
||||
|
||||
1. **数据备份**: 定期备份NFS上的重要数据
|
||||
2. **监控告警**: 设置NFS挂载状态监控
|
||||
3. **容量规划**: 监控存储使用情况
|
||||
4. **网络优化**: 为海外节点配置合适的网络参数
|
||||
|
||||
## 相关文件
|
||||
|
||||
- `playbooks/setup-nfs-by-container-type.yml` - NFS挂载配置
|
||||
- `playbooks/setup-nomad-nfs-client.yml` - Nomad客户端配置
|
||||
- `jobs/nomad-nfs-multi-type.nomad` - 示例Nomad作业
|
||||
- `scripts/deploy-nfs-for-nomad.sh` - 部署脚本
|
||||
Reference in New Issue
Block a user