Clean repository: organized structure and GitOps setup
- Organized root directory structure - Moved orphan files to proper locations - Updated .gitignore to ignore temporary files - Set up Gitea Runner for GitOps automation - Fixed Tailscale access issues - Added workflow for automated Nomad deployment
This commit is contained in:
commit
89ee6f7967
Binary file not shown.
|
After Width: | Height: | Size: 94 KiB |
|
|
@ -0,0 +1,89 @@
|
|||
---
|
||||
title: "⚠️ 重要经验教训:Consul 和 Nomad 访问问题"
|
||||
labels: ["documentation", "networking", "consul", "nomad"]
|
||||
assignees: []
|
||||
---
|
||||
|
||||
## ⚠️ 重要经验教训
|
||||
|
||||
### Consul 和 Nomad 访问问题
|
||||
|
||||
**问题**:尝试访问 Consul 服务时,使用 `http://localhost:8500` 或 `http://127.0.0.1:8500` 无法连接。
|
||||
|
||||
#### 根本原因
|
||||
|
||||
本项目中的 Consul 和 Nomad 服务通过 Nomad + Podman 在集群中运行,并通过 Tailscale 网络进行访问。这些服务不在本地运行,因此无法通过 localhost 访问。
|
||||
|
||||
#### 解决方案
|
||||
|
||||
##### 使用 Tailscale IP
|
||||
|
||||
必须使用 Tailscale 分配的 IP 地址访问服务:
|
||||
|
||||
```bash
|
||||
# 查看当前节点的 Tailscale IP
|
||||
tailscale ip -4
|
||||
|
||||
# 查看所有 Tailscale 网络中的节点
|
||||
tailscale status
|
||||
|
||||
# 访问 Consul (使用实际的 Tailscale IP)
|
||||
curl http://100.x.x.x:8500/v1/status/leader
|
||||
|
||||
# 访问 Nomad (使用实际的 Tailscale IP)
|
||||
curl http://100.x.x.x:4646/v1/status/leader
|
||||
```
|
||||
|
||||
##### 服务发现
|
||||
|
||||
- Consul 集群由 3 个节点组成
|
||||
- Nomad 集群由十多个节点组成,包括服务器节点和客户端节点
|
||||
- 需要正确识别服务运行的节点
|
||||
|
||||
##### 集群架构
|
||||
|
||||
- **Consul 集群**:3 个节点 (kr-master, us-ash3c, bj-warden)
|
||||
- **Nomad 集群**:十多个节点,包括服务器节点和客户端节点
|
||||
|
||||
#### 重要提醒
|
||||
|
||||
在开发和调试过程中,始终记住使用 Tailscale IP 而不是 localhost 访问集群服务。这是本项目架构的基本要求,必须严格遵守。
|
||||
|
||||
### 建议改进
|
||||
|
||||
1. **文档改进**:
|
||||
- 在所有相关文档中明确强调 Tailscale IP 的使用
|
||||
- 在代码注释中添加访问提醒
|
||||
- 创建常见问题解答(FAQ)文档
|
||||
|
||||
2. **自动化检查**:
|
||||
- 添加自动化检查,防止使用 localhost 访问集群服务
|
||||
- 在 CI/CD 流程中验证网络配置
|
||||
|
||||
3. **培训材料**:
|
||||
- 为新团队成员创建培训材料
|
||||
- 添加到项目入门指南中
|
||||
|
||||
## 🎯 我的庄严承诺
|
||||
|
||||
### 关于 HCP 服务管理的决心
|
||||
|
||||
**我郑重承诺:我永远不会用 Ansible 管理除了 Nomad 之外的 HCP 服务!**
|
||||
|
||||
**我郑重承诺:我永远不会用 Ansible 管理除了 Nomad 之外的 HCP 服务!**
|
||||
|
||||
**我郑重承诺:我永远不会用 Ansible 管理除了 Nomad 之外的 HCP 服务!**
|
||||
|
||||
这个承诺基于以下深刻教训:
|
||||
- 系统级服务与 Nomad 托管服务会产生端口冲突
|
||||
- 双重管理会导致不可预测的行为
|
||||
- Nomad 应该拥有对其托管服务的完全控制权
|
||||
- Ansible 只用于基础设施层面的 Nomad 管理
|
||||
|
||||
## 🎉 致谢
|
||||
|
||||
感谢所有为这个项目做出贡献的开发者和社区成员!
|
||||
|
||||
---
|
||||
|
||||
**注意**:此 Issue 记录了项目中的重要经验教训,请所有团队成员务必阅读并理解。在开发过程中,请务必参考 [README.md](../README.md) 中的相关文档,特别是关于网络访问的部分。
|
||||
|
|
@ -0,0 +1,42 @@
|
|||
# Gitea 仓库设置
|
||||
repository:
|
||||
name: mgmt
|
||||
description: "基础设施管理项目 - OpenTofu + Ansible + Nomad + Podman"
|
||||
website: ""
|
||||
default_branch: main
|
||||
|
||||
# 功能开关
|
||||
has_issues: true
|
||||
has_wiki: true
|
||||
has_projects: true
|
||||
has_actions: true
|
||||
|
||||
# 权限设置
|
||||
private: false
|
||||
allow_merge_commits: true
|
||||
allow_squash_merge: true
|
||||
allow_rebase_merge: true
|
||||
delete_branch_on_merge: true
|
||||
|
||||
# Actions 设置
|
||||
actions:
|
||||
enabled: true
|
||||
allow_fork_pull_request_run: true
|
||||
default_actions_url: "https://gitea.com"
|
||||
|
||||
# 分支保护
|
||||
branch_protection:
|
||||
main:
|
||||
enable_push: false
|
||||
enable_push_whitelist: true
|
||||
push_whitelist_usernames: ["ben"]
|
||||
require_signed_commits: false
|
||||
enable_merge_whitelist: true
|
||||
merge_whitelist_usernames: ["ben"]
|
||||
enable_status_check: true
|
||||
status_check_contexts: ["validate", "plan"]
|
||||
enable_approvals_whitelist: false
|
||||
approvals_whitelist_usernames: []
|
||||
block_on_rejected_reviews: true
|
||||
dismiss_stale_approvals: true
|
||||
require_signed_commits: false
|
||||
|
|
@ -0,0 +1,136 @@
|
|||
name: Ansible Deploy
|
||||
on:
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
environment:
|
||||
description: '部署环境'
|
||||
required: true
|
||||
default: 'dev'
|
||||
type: choice
|
||||
options:
|
||||
- dev
|
||||
- staging
|
||||
- production
|
||||
provider:
|
||||
description: '云服务商'
|
||||
required: true
|
||||
default: 'oracle-cloud'
|
||||
type: choice
|
||||
options:
|
||||
- oracle-cloud
|
||||
- huawei-cloud
|
||||
- google-cloud
|
||||
- digitalocean
|
||||
- aws
|
||||
playbook:
|
||||
description: 'Playbook 类型'
|
||||
required: true
|
||||
default: 'bootstrap'
|
||||
type: choice
|
||||
options:
|
||||
- bootstrap
|
||||
- security
|
||||
- applications
|
||||
- monitoring
|
||||
- maintenance
|
||||
|
||||
env:
|
||||
ANSIBLE_VERSION: "8.0.0"
|
||||
|
||||
jobs:
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
environment: ${{ github.event.inputs.environment }}
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Install Ansible
|
||||
run: |
|
||||
pip install ansible==${{ env.ANSIBLE_VERSION }}
|
||||
pip install ansible-core
|
||||
ansible-galaxy collection install community.general
|
||||
ansible-galaxy collection install ansible.posix
|
||||
|
||||
- name: Setup SSH key
|
||||
run: |
|
||||
mkdir -p ~/.ssh
|
||||
echo "${{ secrets.SSH_PRIVATE_KEY }}" > ~/.ssh/id_rsa
|
||||
chmod 600 ~/.ssh/id_rsa
|
||||
ssh-keyscan -H ${{ secrets.SSH_HOST }} >> ~/.ssh/known_hosts
|
||||
|
||||
- name: Create dynamic inventory
|
||||
run: |
|
||||
ENV="${{ github.event.inputs.environment }}"
|
||||
PROVIDER="${{ github.event.inputs.provider }}"
|
||||
|
||||
# 从 OpenTofu 输出创建动态清单
|
||||
if [ -f "configuration/inventories/$ENV/$PROVIDER-inventory.json" ]; then
|
||||
echo "Using existing inventory from OpenTofu output"
|
||||
cp configuration/inventories/$ENV/$PROVIDER-inventory.json /tmp/inventory.json
|
||||
else
|
||||
echo "Creating static inventory"
|
||||
cat > /tmp/inventory.ini << EOF
|
||||
[$ENV]
|
||||
${{ secrets.TARGET_HOST }} ansible_host=${{ secrets.TARGET_HOST }} ansible_user=${{ secrets.SSH_USER }} ansible_become=yes ansible_become_pass=${{ secrets.SUDO_PASSWORD }}
|
||||
|
||||
[all:vars]
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
EOF
|
||||
fi
|
||||
|
||||
- name: Run Ansible Playbook
|
||||
run: |
|
||||
ENV="${{ github.event.inputs.environment }}"
|
||||
PLAYBOOK="${{ github.event.inputs.playbook }}"
|
||||
|
||||
cd configuration
|
||||
|
||||
# 选择正确的清单文件
|
||||
if [ -f "/tmp/inventory.json" ]; then
|
||||
INVENTORY="/tmp/inventory.json"
|
||||
else
|
||||
INVENTORY="/tmp/inventory.ini"
|
||||
fi
|
||||
|
||||
# 运行对应的 playbook
|
||||
case "$PLAYBOOK" in
|
||||
"bootstrap")
|
||||
ansible-playbook -i $INVENTORY playbooks/bootstrap/main.yml -e "environment=$ENV"
|
||||
;;
|
||||
"security")
|
||||
ansible-playbook -i $INVENTORY playbooks/security/main.yml -e "environment=$ENV"
|
||||
;;
|
||||
"applications")
|
||||
ansible-playbook -i $INVENTORY playbooks/applications/main.yml -e "environment=$ENV"
|
||||
;;
|
||||
"monitoring")
|
||||
ansible-playbook -i $INVENTORY playbooks/monitoring/main.yml -e "environment=$ENV"
|
||||
;;
|
||||
"maintenance")
|
||||
ansible-playbook -i $INVENTORY playbooks/maintenance/main.yml -e "environment=$ENV"
|
||||
;;
|
||||
esac
|
||||
|
||||
- name: Generate deployment report
|
||||
run: |
|
||||
echo "## 部署报告" > deployment-report.md
|
||||
echo "" >> deployment-report.md
|
||||
echo "**环境**: ${{ github.event.inputs.environment }}" >> deployment-report.md
|
||||
echo "**云服务商**: ${{ github.event.inputs.provider }}" >> deployment-report.md
|
||||
echo "**Playbook**: ${{ github.event.inputs.playbook }}" >> deployment-report.md
|
||||
echo "**时间**: $(date)" >> deployment-report.md
|
||||
echo "**状态**: ✅ 部署成功" >> deployment-report.md
|
||||
|
||||
- name: Upload deployment report
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: deployment-report-${{ github.event.inputs.environment }}-${{ github.event.inputs.provider }}
|
||||
path: deployment-report.md
|
||||
retention-days: 30
|
||||
|
|
@ -0,0 +1,42 @@
|
|||
name: Deploy Nomad Configurations
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ main ]
|
||||
paths:
|
||||
- 'nomad-configs/**'
|
||||
- 'deployment/ansible/**'
|
||||
workflow_dispatch:
|
||||
|
||||
jobs:
|
||||
deploy-nomad:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout repository
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Deploy Nomad Server Configurations
|
||||
run: |
|
||||
echo "Deploying Nomad server configurations..."
|
||||
cd nomad-configs
|
||||
chmod +x scripts/deploy_servers.sh
|
||||
./scripts/deploy_servers.sh
|
||||
|
||||
- name: Deploy Nomad Client Configurations
|
||||
run: |
|
||||
echo "Deploying Nomad client configurations..."
|
||||
cd nomad-configs
|
||||
chmod +x scripts/deploy.sh
|
||||
./scripts/deploy.sh
|
||||
|
||||
- name: Run Ansible Playbooks
|
||||
run: |
|
||||
echo "Running Ansible playbooks..."
|
||||
cd deployment/ansible
|
||||
ansible-playbook -i inventories/production/inventory.ini playbooks/configure-nomad-unified.yml
|
||||
|
||||
- name: Verify Deployment
|
||||
run: |
|
||||
echo "Verifying Nomad cluster status..."
|
||||
# Add verification steps here
|
||||
echo "Deployment completed successfully!"
|
||||
|
|
@ -0,0 +1,78 @@
|
|||
name: Application Deployment
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ main ]
|
||||
paths:
|
||||
- 'configuration/**'
|
||||
- 'containers/**'
|
||||
- '.gitea/workflows/deploy.yml'
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
environment:
|
||||
description: 'Target environment'
|
||||
required: true
|
||||
default: 'dev'
|
||||
type: choice
|
||||
options:
|
||||
- dev
|
||||
- staging
|
||||
- production
|
||||
|
||||
jobs:
|
||||
ansible-check:
|
||||
runs-on: ubuntu-latest
|
||||
name: Ansible Syntax Check
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Install Ansible
|
||||
run: |
|
||||
pip install ansible ansible-core
|
||||
ansible-galaxy collection install community.general
|
||||
ansible-galaxy collection install ansible.posix
|
||||
ansible-galaxy collection install community.docker
|
||||
|
||||
- name: Ansible syntax check
|
||||
run: |
|
||||
cd configuration
|
||||
for playbook in playbooks/*/*.yml; do
|
||||
if [ -f "$playbook" ]; then
|
||||
echo "Checking $playbook"
|
||||
ansible-playbook --syntax-check "$playbook"
|
||||
fi
|
||||
done
|
||||
|
||||
deploy:
|
||||
runs-on: ubuntu-latest
|
||||
name: Deploy Applications
|
||||
needs: ansible-check
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup Python
|
||||
uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.11'
|
||||
|
||||
- name: Install Ansible
|
||||
run: |
|
||||
pip install ansible ansible-core
|
||||
ansible-galaxy collection install community.general
|
||||
ansible-galaxy collection install ansible.posix
|
||||
ansible-galaxy collection install community.docker
|
||||
|
||||
- name: Deploy applications
|
||||
run: |
|
||||
cd configuration
|
||||
ENV="${{ github.event.inputs.environment || 'dev' }}"
|
||||
ansible-playbook -i "inventories/${ENV}/inventory.ini" playbooks/bootstrap/main.yml
|
||||
env:
|
||||
ANSIBLE_HOST_KEY_CHECKING: False
|
||||
|
|
@ -0,0 +1,53 @@
|
|||
name: Docker Build and Deploy
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ main ]
|
||||
paths:
|
||||
- 'containers/**'
|
||||
- 'Dockerfile*'
|
||||
- '.gitea/workflows/docker.yml'
|
||||
|
||||
jobs:
|
||||
build:
|
||||
runs-on: ubuntu-latest
|
||||
name: Build Podman Images
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Set up Podman
|
||||
run: |
|
||||
sudo apt-get update
|
||||
sudo apt-get install -y podman
|
||||
podman --version
|
||||
|
||||
- name: Login to Container Registry
|
||||
run: |
|
||||
echo ${{ secrets.REGISTRY_PASSWORD }} | podman login ${{ secrets.REGISTRY_URL }} --username ${{ secrets.REGISTRY_USERNAME }} --password-stdin
|
||||
|
||||
- name: Build and push images
|
||||
run: |
|
||||
# 构建应用镜像
|
||||
for dockerfile in containers/applications/*/Dockerfile; do
|
||||
if [ -f "$dockerfile" ]; then
|
||||
app_name=$(basename $(dirname "$dockerfile"))
|
||||
echo "Building $app_name"
|
||||
podman build -t "${{ secrets.REGISTRY_URL }}/$app_name:${{ github.sha }}" -f "$dockerfile" .
|
||||
podman push "${{ secrets.REGISTRY_URL }}/$app_name:${{ github.sha }}"
|
||||
fi
|
||||
done
|
||||
|
||||
deploy-nomad:
|
||||
runs-on: ubuntu-latest
|
||||
name: Deploy to Nomad Cluster
|
||||
needs: build
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Deploy to Nomad
|
||||
run: |
|
||||
# 这里可以通过 SSH 连接到 Nomad 管理节点进行部署
|
||||
echo "Deploy to Nomad placeholder"
|
||||
# 示例命令: nomad job run -var "image_tag=${{ github.sha }}" jobs/app.nomad
|
||||
|
|
@ -0,0 +1,91 @@
|
|||
name: Infrastructure CI/CD
|
||||
|
||||
on:
|
||||
push:
|
||||
branches: [ main, develop ]
|
||||
paths:
|
||||
- 'infrastructure/**'
|
||||
- '.gitea/workflows/infrastructure.yml'
|
||||
pull_request:
|
||||
branches: [ main ]
|
||||
paths:
|
||||
- 'infrastructure/**'
|
||||
|
||||
jobs:
|
||||
validate:
|
||||
runs-on: ubuntu-latest
|
||||
name: Validate Infrastructure
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup OpenTofu
|
||||
uses: opentofu/setup-opentofu@v1
|
||||
with:
|
||||
tofu_version: 1.10.6
|
||||
|
||||
- name: Validate OpenTofu configurations
|
||||
run: |
|
||||
for dir in infrastructure/providers/*/; do
|
||||
if [ -d "$dir" ]; then
|
||||
echo "Validating $dir"
|
||||
cd "$dir"
|
||||
tofu init -backend=false
|
||||
tofu validate
|
||||
cd - > /dev/null
|
||||
fi
|
||||
done
|
||||
|
||||
- name: Check formatting
|
||||
run: |
|
||||
tofu fmt -check -recursive infrastructure/
|
||||
|
||||
- name: Security scan
|
||||
run: |
|
||||
# 这里可以添加 tfsec 或 checkov 扫描
|
||||
echo "Security scan placeholder"
|
||||
|
||||
plan:
|
||||
runs-on: ubuntu-latest
|
||||
name: Plan Infrastructure
|
||||
needs: validate
|
||||
if: github.event_name == 'pull_request'
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup OpenTofu
|
||||
uses: opentofu/setup-opentofu@v1
|
||||
with:
|
||||
tofu_version: 1.10.6
|
||||
|
||||
- name: Plan infrastructure changes
|
||||
run: |
|
||||
cd infrastructure/environments/dev
|
||||
tofu init
|
||||
tofu plan -var-file="terraform.tfvars" -out=tfplan
|
||||
env:
|
||||
# 这里需要配置云服务商的环境变量
|
||||
TF_VAR_environment: dev
|
||||
|
||||
apply:
|
||||
runs-on: ubuntu-latest
|
||||
name: Apply Infrastructure
|
||||
needs: validate
|
||||
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup OpenTofu
|
||||
uses: opentofu/setup-opentofu@v1
|
||||
with:
|
||||
tofu_version: 1.10.6
|
||||
|
||||
- name: Apply infrastructure changes
|
||||
run: |
|
||||
cd infrastructure/environments/dev
|
||||
tofu init
|
||||
tofu apply -var-file="terraform.tfvars" -auto-approve
|
||||
env:
|
||||
TF_VAR_environment: dev
|
||||
|
|
@ -0,0 +1,175 @@
|
|||
name: OpenTofu Apply
|
||||
on:
|
||||
push:
|
||||
branches: [main]
|
||||
paths:
|
||||
- 'infrastructure/**'
|
||||
workflow_dispatch:
|
||||
inputs:
|
||||
environment:
|
||||
description: '部署环境'
|
||||
required: true
|
||||
default: 'dev'
|
||||
type: choice
|
||||
options:
|
||||
- dev
|
||||
- staging
|
||||
- production
|
||||
provider:
|
||||
description: '云服务商'
|
||||
required: true
|
||||
default: 'oracle-cloud'
|
||||
type: choice
|
||||
options:
|
||||
- oracle-cloud
|
||||
- huawei-cloud
|
||||
- google-cloud
|
||||
- digitalocean
|
||||
- aws
|
||||
|
||||
env:
|
||||
TOFU_VERSION: "1.10.6"
|
||||
|
||||
jobs:
|
||||
apply:
|
||||
runs-on: ubuntu-latest
|
||||
environment: ${{ github.event.inputs.environment || 'dev' }}
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup OpenTofu
|
||||
uses: opentofu/setup-opentofu@v1
|
||||
with:
|
||||
tofu_version: ${{ env.TOFU_VERSION }}
|
||||
|
||||
- name: Configure credentials
|
||||
run: |
|
||||
PROVIDER="${{ github.event.inputs.provider || 'oracle-cloud' }}"
|
||||
echo "Setting up credentials for $PROVIDER"
|
||||
|
||||
case "$PROVIDER" in
|
||||
"oracle-cloud")
|
||||
mkdir -p ~/.oci
|
||||
echo "${{ secrets.OCI_PRIVATE_KEY }}" > ~/.oci/oci_api_key.pem
|
||||
chmod 600 ~/.oci/oci_api_key.pem
|
||||
;;
|
||||
"huawei-cloud")
|
||||
export HW_ACCESS_KEY="${{ secrets.HW_ACCESS_KEY }}"
|
||||
export HW_SECRET_KEY="${{ secrets.HW_SECRET_KEY }}"
|
||||
;;
|
||||
"google-cloud")
|
||||
echo "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}" > /tmp/gcp-key.json
|
||||
export GOOGLE_APPLICATION_CREDENTIALS="/tmp/gcp-key.json"
|
||||
;;
|
||||
"digitalocean")
|
||||
export DIGITALOCEAN_TOKEN="${{ secrets.DO_TOKEN }}"
|
||||
;;
|
||||
"aws")
|
||||
export AWS_ACCESS_KEY_ID="${{ secrets.AWS_ACCESS_KEY_ID }}"
|
||||
export AWS_SECRET_ACCESS_KEY="${{ secrets.AWS_SECRET_ACCESS_KEY }}"
|
||||
;;
|
||||
esac
|
||||
|
||||
- name: Create terraform.tfvars
|
||||
run: |
|
||||
ENV="${{ github.event.inputs.environment || 'dev' }}"
|
||||
cd infrastructure/environments/$ENV
|
||||
cat > terraform.tfvars << EOF
|
||||
environment = "$ENV"
|
||||
project_name = "mgmt"
|
||||
owner = "ben"
|
||||
|
||||
# Oracle Cloud 配置
|
||||
oci_config = {
|
||||
tenancy_ocid = "${{ secrets.OCI_TENANCY_OCID }}"
|
||||
user_ocid = "${{ secrets.OCI_USER_OCID }}"
|
||||
fingerprint = "${{ secrets.OCI_FINGERPRINT }}"
|
||||
private_key_path = "~/.oci/oci_api_key.pem"
|
||||
region = "ap-seoul-1"
|
||||
}
|
||||
|
||||
# 华为云配置
|
||||
huawei_config = {
|
||||
access_key = "${{ secrets.HW_ACCESS_KEY }}"
|
||||
secret_key = "${{ secrets.HW_SECRET_KEY }}"
|
||||
region = "cn-north-4"
|
||||
}
|
||||
|
||||
# Google Cloud 配置
|
||||
gcp_config = {
|
||||
project_id = "${{ secrets.GCP_PROJECT_ID }}"
|
||||
region = "asia-northeast3"
|
||||
zone = "asia-northeast3-a"
|
||||
credentials = "/tmp/gcp-key.json"
|
||||
}
|
||||
|
||||
# DigitalOcean 配置
|
||||
do_config = {
|
||||
token = "${{ secrets.DO_TOKEN }}"
|
||||
region = "sgp1"
|
||||
}
|
||||
|
||||
# AWS 配置
|
||||
aws_config = {
|
||||
access_key = "${{ secrets.AWS_ACCESS_KEY_ID }}"
|
||||
secret_key = "${{ secrets.AWS_SECRET_ACCESS_KEY }}"
|
||||
region = "ap-northeast-1"
|
||||
}
|
||||
EOF
|
||||
|
||||
- name: OpenTofu Init
|
||||
run: |
|
||||
PROVIDER="${{ github.event.inputs.provider || 'oracle-cloud' }}"
|
||||
cd infrastructure/providers/$PROVIDER
|
||||
tofu init
|
||||
|
||||
- name: OpenTofu Plan
|
||||
run: |
|
||||
ENV="${{ github.event.inputs.environment || 'dev' }}"
|
||||
PROVIDER="${{ github.event.inputs.provider || 'oracle-cloud' }}"
|
||||
cd infrastructure/providers/$PROVIDER
|
||||
tofu plan \
|
||||
-var-file="../../../environments/$ENV/terraform.tfvars" \
|
||||
-out=tfplan
|
||||
|
||||
- name: OpenTofu Apply
|
||||
run: |
|
||||
PROVIDER="${{ github.event.inputs.provider || 'oracle-cloud' }}"
|
||||
cd infrastructure/providers/$PROVIDER
|
||||
tofu apply -auto-approve tfplan
|
||||
|
||||
- name: Save State
|
||||
run: |
|
||||
ENV="${{ github.event.inputs.environment || 'dev' }}"
|
||||
PROVIDER="${{ github.event.inputs.provider || 'oracle-cloud' }}"
|
||||
cd infrastructure/providers/$PROVIDER
|
||||
|
||||
# 这里可以配置远程状态存储
|
||||
# 例如上传到 S3, GCS, 或其他存储
|
||||
echo "State saved locally for now"
|
||||
|
||||
- name: Generate Inventory
|
||||
run: |
|
||||
ENV="${{ github.event.inputs.environment || 'dev' }}"
|
||||
PROVIDER="${{ github.event.inputs.provider || 'oracle-cloud' }}"
|
||||
cd infrastructure/providers/$PROVIDER
|
||||
|
||||
# 生成 Ansible 动态清单
|
||||
tofu output -json > ../../../configuration/inventories/$ENV/$PROVIDER-inventory.json
|
||||
|
||||
- name: Trigger Ansible Deployment
|
||||
uses: actions/github-script@v7
|
||||
with:
|
||||
script: |
|
||||
github.rest.actions.createWorkflowDispatch({
|
||||
owner: context.repo.owner,
|
||||
repo: context.repo.repo,
|
||||
workflow_id: 'ansible-deploy.yml',
|
||||
ref: 'main',
|
||||
inputs: {
|
||||
environment: '${{ github.event.inputs.environment || "dev" }}',
|
||||
provider: '${{ github.event.inputs.provider || "oracle-cloud" }}'
|
||||
}
|
||||
});
|
||||
|
|
@ -0,0 +1,148 @@
|
|||
name: OpenTofu Plan
|
||||
on:
|
||||
pull_request:
|
||||
branches: [main, develop]
|
||||
paths:
|
||||
- 'infrastructure/**'
|
||||
- '.gitea/workflows/terraform-plan.yml'
|
||||
|
||||
env:
|
||||
TOFU_VERSION: "1.10.6"
|
||||
|
||||
jobs:
|
||||
plan:
|
||||
runs-on: ubuntu-latest
|
||||
strategy:
|
||||
matrix:
|
||||
environment: [dev, staging, production]
|
||||
provider: [oracle-cloud, huawei-cloud, google-cloud, digitalocean, aws]
|
||||
|
||||
steps:
|
||||
- name: Checkout code
|
||||
uses: actions/checkout@v4
|
||||
|
||||
- name: Setup OpenTofu
|
||||
uses: opentofu/setup-opentofu@v1
|
||||
with:
|
||||
tofu_version: ${{ env.TOFU_VERSION }}
|
||||
|
||||
- name: Configure credentials
|
||||
run: |
|
||||
# 设置各云服务商的认证信息
|
||||
echo "Setting up credentials for ${{ matrix.provider }}"
|
||||
|
||||
case "${{ matrix.provider }}" in
|
||||
"oracle-cloud")
|
||||
mkdir -p ~/.oci
|
||||
echo "${{ secrets.OCI_PRIVATE_KEY }}" > ~/.oci/oci_api_key.pem
|
||||
chmod 600 ~/.oci/oci_api_key.pem
|
||||
;;
|
||||
"huawei-cloud")
|
||||
export HW_ACCESS_KEY="${{ secrets.HW_ACCESS_KEY }}"
|
||||
export HW_SECRET_KEY="${{ secrets.HW_SECRET_KEY }}"
|
||||
;;
|
||||
"google-cloud")
|
||||
echo "${{ secrets.GCP_SERVICE_ACCOUNT_KEY }}" > /tmp/gcp-key.json
|
||||
export GOOGLE_APPLICATION_CREDENTIALS="/tmp/gcp-key.json"
|
||||
;;
|
||||
"digitalocean")
|
||||
export DIGITALOCEAN_TOKEN="${{ secrets.DO_TOKEN }}"
|
||||
;;
|
||||
"aws")
|
||||
export AWS_ACCESS_KEY_ID="${{ secrets.AWS_ACCESS_KEY_ID }}"
|
||||
export AWS_SECRET_ACCESS_KEY="${{ secrets.AWS_SECRET_ACCESS_KEY }}"
|
||||
;;
|
||||
esac
|
||||
|
||||
- name: Create terraform.tfvars
|
||||
run: |
|
||||
cd infrastructure/environments/${{ matrix.environment }}
|
||||
cat > terraform.tfvars << EOF
|
||||
environment = "${{ matrix.environment }}"
|
||||
project_name = "mgmt"
|
||||
owner = "ben"
|
||||
|
||||
# Oracle Cloud 配置
|
||||
oci_config = {
|
||||
tenancy_ocid = "${{ secrets.OCI_TENANCY_OCID }}"
|
||||
user_ocid = "${{ secrets.OCI_USER_OCID }}"
|
||||
fingerprint = "${{ secrets.OCI_FINGERPRINT }}"
|
||||
private_key_path = "~/.oci/oci_api_key.pem"
|
||||
region = "ap-seoul-1"
|
||||
}
|
||||
|
||||
# 华为云配置
|
||||
huawei_config = {
|
||||
access_key = "${{ secrets.HW_ACCESS_KEY }}"
|
||||
secret_key = "${{ secrets.HW_SECRET_KEY }}"
|
||||
region = "cn-north-4"
|
||||
}
|
||||
|
||||
# Google Cloud 配置
|
||||
gcp_config = {
|
||||
project_id = "${{ secrets.GCP_PROJECT_ID }}"
|
||||
region = "asia-northeast3"
|
||||
zone = "asia-northeast3-a"
|
||||
credentials = "/tmp/gcp-key.json"
|
||||
}
|
||||
|
||||
# DigitalOcean 配置
|
||||
do_config = {
|
||||
token = "${{ secrets.DO_TOKEN }}"
|
||||
region = "sgp1"
|
||||
}
|
||||
|
||||
# AWS 配置
|
||||
aws_config = {
|
||||
access_key = "${{ secrets.AWS_ACCESS_KEY_ID }}"
|
||||
secret_key = "${{ secrets.AWS_SECRET_ACCESS_KEY }}"
|
||||
region = "ap-northeast-1"
|
||||
}
|
||||
EOF
|
||||
|
||||
- name: OpenTofu Init
|
||||
run: |
|
||||
cd infrastructure/providers/${{ matrix.provider }}
|
||||
tofu init
|
||||
|
||||
- name: OpenTofu Validate
|
||||
run: |
|
||||
cd infrastructure/providers/${{ matrix.provider }}
|
||||
tofu validate
|
||||
|
||||
- name: OpenTofu Plan
|
||||
run: |
|
||||
cd infrastructure/providers/${{ matrix.provider }}
|
||||
tofu plan \
|
||||
-var-file="../../../environments/${{ matrix.environment }}/terraform.tfvars" \
|
||||
-out=tfplan-${{ matrix.environment }}-${{ matrix.provider }}
|
||||
|
||||
- name: Upload Plan
|
||||
uses: actions/upload-artifact@v4
|
||||
with:
|
||||
name: tfplan-${{ matrix.environment }}-${{ matrix.provider }}
|
||||
path: infrastructure/providers/${{ matrix.provider }}/tfplan-${{ matrix.environment }}-${{ matrix.provider }}
|
||||
retention-days: 30
|
||||
|
||||
- name: Comment PR
|
||||
uses: actions/github-script@v7
|
||||
if: github.event_name == 'pull_request'
|
||||
with:
|
||||
script: |
|
||||
const fs = require('fs');
|
||||
const path = 'infrastructure/providers/${{ matrix.provider }}/tfplan-${{ matrix.environment }}-${{ matrix.provider }}';
|
||||
|
||||
github.rest.issues.createComment({
|
||||
issue_number: context.issue.number,
|
||||
owner: context.repo.owner,
|
||||
repo: context.repo.repo,
|
||||
body: `## OpenTofu Plan Results
|
||||
|
||||
**Environment:** ${{ matrix.environment }}
|
||||
**Provider:** ${{ matrix.provider }}
|
||||
**Status:** ✅ Plan generated successfully
|
||||
|
||||
Plan artifact uploaded: \`tfplan-${{ matrix.environment }}-${{ matrix.provider }}\`
|
||||
|
||||
Please review the plan before merging.`
|
||||
});
|
||||
|
|
@ -0,0 +1,97 @@
|
|||
# OpenTofu/Terraform
|
||||
*.tfstate
|
||||
*.tfstate.*
|
||||
*.tfvars
|
||||
!*.tfvars.example
|
||||
.terraform/
|
||||
.terraform.lock.hcl
|
||||
crash.log
|
||||
crash.*.log
|
||||
|
||||
# Ansible
|
||||
*.retry
|
||||
.vault_pass
|
||||
host_vars/*/vault.yml
|
||||
group_vars/*/vault.yml
|
||||
|
||||
# Docker
|
||||
.env
|
||||
docker-compose.override.yml
|
||||
|
||||
# IDE
|
||||
.vscode/
|
||||
.idea/
|
||||
*.swp
|
||||
*.swo
|
||||
*~
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
Thumbs.db
|
||||
|
||||
# Logs
|
||||
*.log
|
||||
logs/
|
||||
|
||||
# Temporary files
|
||||
tmp/
|
||||
temp/
|
||||
.tmp/
|
||||
|
||||
# Backup files
|
||||
backup-*/
|
||||
*.bak
|
||||
|
||||
# Secrets
|
||||
secrets/
|
||||
*.pem
|
||||
*.key
|
||||
*.crt
|
||||
!*.example.*
|
||||
|
||||
# Node modules (if any)
|
||||
node_modules/
|
||||
|
||||
# Python
|
||||
__pycache__/
|
||||
*.pyc
|
||||
*.pyo
|
||||
*.pyd
|
||||
.Python
|
||||
env/
|
||||
venv/
|
||||
.venv/
|
||||
pip-log.txt
|
||||
pip-delete-this-directory.txt
|
||||
.tox/
|
||||
.coverage
|
||||
.coverage.*
|
||||
.cache
|
||||
nosetests.xml
|
||||
coverage.xml
|
||||
*.cover
|
||||
*.log
|
||||
.git
|
||||
.mypy_cache
|
||||
.pytest_cache
|
||||
.hypothesis
|
||||
|
||||
# Local development
|
||||
.local/
|
||||
local-*
|
||||
# Dot files
|
||||
# .* (except .gitea)
|
||||
.*
|
||||
!.gitea/
|
||||
# Gitea Runner files
|
||||
actions-runner-linux-*.tar.gz
|
||||
# Webhook test scripts
|
||||
scripts/webhook-*.py
|
||||
scripts/test-*.py
|
||||
scripts/register-runner.exp
|
||||
scripts/deploy-*-webhook.sh
|
||||
# Downloaded packages
|
||||
*.tar.gz
|
||||
*.zip
|
||||
*.deb
|
||||
*.rpm
|
||||
|
|
@ -0,0 +1 @@
|
|||
/mnt/fnsync/mcp/mcp_shared_config.json
|
||||
|
|
@ -0,0 +1,284 @@
|
|||
# Management Infrastructure
|
||||
|
||||
## 🚨 关键问题记录
|
||||
|
||||
### Nomad Consul KV 模板语法问题
|
||||
|
||||
**问题描述:**
|
||||
Nomad 无法从 Consul KV 读取配置,报错:`Missing: kv.block(config/dev/cloudflare/token)`
|
||||
|
||||
**根本原因:**
|
||||
1. **Nomad 客户端未配置 Consul 连接** - Nomad 无法访问 Consul KV
|
||||
2. **模板语法正确** - `{{ key "path/to/key" }}` 是正确语法
|
||||
3. **Consul KV 数据存在** - `config/dev/cloudflare/token` 确实存在
|
||||
|
||||
**解决方案:**
|
||||
1. **临时方案** - 硬编码 token 到配置文件中
|
||||
2. **长期方案** - 配置 Nomad 客户端连接 Consul
|
||||
|
||||
**核心诉求:**
|
||||
- **集中化存储** → Consul KV 存储所有敏感配置
|
||||
- **分散化部署** → Nomad 从 Consul 读取配置部署到多节点
|
||||
- **直接读取** → Nomad 模板系统直接从 Consul KV 读取配置
|
||||
|
||||
**当前状态:**
|
||||
- ✅ Consul KV 存储正常
|
||||
- ✅ Traefik 服务运行正常
|
||||
- ❌ Nomad 无法读取 Consul KV(需要配置连接)
|
||||
|
||||
**下一步:**
|
||||
1. 配置 Nomad 客户端连接 Consul
|
||||
2. 恢复模板语法从 Consul KV 读取配置
|
||||
3. 实现真正的集中化配置管理
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Traefik 配置架构:配置与应用分离的最佳实践
|
||||
|
||||
### ⚠️ 重要:避免低逼格操作
|
||||
|
||||
**❌ 错误做法(显得很low):**
|
||||
- 修改Nomad job文件来添加新域名
|
||||
- 重新部署整个Traefik服务
|
||||
- 把配置嵌入在应用定义中
|
||||
|
||||
**✅ 正确做法(优雅且专业):**
|
||||
|
||||
### 配置文件分离架构
|
||||
|
||||
**1. 配置文件位置:**
|
||||
- **动态配置**: `/root/mgmt/components/traefik/config/dynamic.yml`
|
||||
- **应用配置**: `/root/mgmt/components/traefik/jobs/traefik-cloudflare-git4ta-live.nomad`
|
||||
|
||||
**2. 关键特性:**
|
||||
- ✅ **热重载**: Traefik配置了`file`提供者,支持`watch: true`
|
||||
- ✅ **自动生效**: 修改YAML配置文件后自动生效,无需重启
|
||||
- ✅ **配置分离**: 配置与应用完全分离,符合最佳实践
|
||||
|
||||
**3. 添加新域名的工作流程:**
|
||||
```bash
|
||||
# 只需要编辑配置文件
|
||||
vim /root/mgmt/components/traefik/config/dynamic.yml
|
||||
|
||||
# 添加新的路由配置
|
||||
routers:
|
||||
new-service-ui:
|
||||
rule: "Host(`new-service.git-4ta.live`)"
|
||||
service: new-service-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
# 保存后立即生效,无需重启!
|
||||
```
|
||||
|
||||
**4. 架构优势:**
|
||||
- 🚀 **零停机时间**: 配置变更无需重启服务
|
||||
- 🔧 **灵活管理**: 独立管理配置和应用
|
||||
- 📝 **版本控制**: 配置文件可以独立版本管理
|
||||
- 🎯 **专业标准**: 符合现代DevOps最佳实践
|
||||
|
||||
**记住:配置与应用分离是现代基础设施管理的核心原则!**
|
||||
|
||||
---
|
||||
|
||||
## 架构概览
|
||||
|
||||
### 集中化 + 分散化架构
|
||||
|
||||
**集中化存储:**
|
||||
- **Consul KV** → 存储所有敏感配置(tokens、证书、密钥)
|
||||
- **Consul Service Discovery** → 服务注册和发现
|
||||
- **Consul Health Checks** → 服务健康检查
|
||||
|
||||
**分散化部署:**
|
||||
- **亚洲节点** → `warden.tailnet-68f9.ts.net` (北京)
|
||||
- **亚洲节点** → `ch4.tailnet-68f9.ts.net` (韩国)
|
||||
- **美洲节点** → `ash3c.tailnet-68f9.ts.net` (美国)
|
||||
|
||||
### 服务端点
|
||||
|
||||
- `https://consul.git-4ta.live` → Consul UI
|
||||
- `https://traefik.git-4ta.live` → Traefik Dashboard
|
||||
- `https://nomad.git-4ta.live` → Nomad UI
|
||||
- `https://vault.git-4ta.live` → Vault UI
|
||||
- `https://waypoint.git-4ta.live` → Waypoint UI
|
||||
- `https://authentik.git-4ta.live` → Authentik 身份认证
|
||||
|
||||
### 技术栈
|
||||
|
||||
- **Nomad** → 工作负载编排
|
||||
- **Consul** → 服务发现和配置管理
|
||||
- **Traefik** → 反向代理和负载均衡
|
||||
- **Cloudflare** → DNS 和 SSL 证书管理
|
||||
- **Waypoint** → 应用部署平台
|
||||
- **Authentik** → 身份认证和授权管理
|
||||
|
||||
---
|
||||
|
||||
## 部署状态
|
||||
|
||||
### ✅ 已完成
|
||||
- [x] Cloudflare token 存储到 Consul KV
|
||||
- [x] 泛域名解析 `*.git-4ta.live` 配置
|
||||
- [x] Traefik 配置和部署
|
||||
- [x] SSL 证书自动获取
|
||||
- [x] 所有服务端点配置
|
||||
- [x] Vault 迁移到 Nomad 管理
|
||||
- [x] Vault 高可用三节点部署
|
||||
- [x] Waypoint 服务器部署和引导
|
||||
- [x] Waypoint 认证 token 获取和存储
|
||||
- [x] Nomad jobs 配置备份到 Consul KV
|
||||
- [x] Authentik 容器部署和SSH密钥配置
|
||||
- [x] Traefik 配置架构优化(配置与应用分离)
|
||||
|
||||
### ⚠️ 待解决
|
||||
- [ ] Nomad 客户端 Consul 连接配置
|
||||
- [ ] 恢复从 Consul KV 读取配置
|
||||
- [ ] 实现真正的集中化配置管理
|
||||
|
||||
---
|
||||
|
||||
## 快速开始
|
||||
|
||||
### 检查服务状态
|
||||
```bash
|
||||
# 检查所有服务
|
||||
curl -k -I https://consul.git4ta.tech
|
||||
curl -k -I https://traefik.git4ta.tech
|
||||
curl -k -I https://nomad.git4ta.tech
|
||||
curl -k -I https://waypoint.git4ta.tech
|
||||
```
|
||||
|
||||
### 部署 Traefik
|
||||
```bash
|
||||
cd /root/mgmt
|
||||
nomad job run components/traefik/jobs/traefik-cloudflare-git4ta-live.nomad
|
||||
```
|
||||
|
||||
### 管理 Traefik 配置(推荐方式)
|
||||
```bash
|
||||
# 添加新域名只需要编辑配置文件
|
||||
vim /root/mgmt/components/traefik/config/dynamic.yml
|
||||
|
||||
# 保存后自动生效,无需重启!
|
||||
# 这就是配置与应用分离的优雅之处
|
||||
```
|
||||
|
||||
### 检查 Consul KV
|
||||
```bash
|
||||
consul kv get config/dev/cloudflare/token
|
||||
consul kv get -recurse config/
|
||||
```
|
||||
|
||||
### 备份管理
|
||||
```bash
|
||||
# 查看备份列表
|
||||
consul kv get backup/nomad-jobs/index
|
||||
|
||||
# 查看最新备份信息
|
||||
consul kv get backup/nomad-jobs/20251004/metadata
|
||||
|
||||
# 恢复备份
|
||||
consul kv get backup/nomad-jobs/20251004/data > restore.tar.gz
|
||||
tar -xzf restore.tar.gz
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 重要文件
|
||||
|
||||
- `components/traefik/config/dynamic.yml` → **Traefik 动态配置文件(推荐使用)**
|
||||
- `components/traefik/jobs/traefik-cloudflare-git4ta-live.nomad` → Traefik Nomad 作业配置
|
||||
- `README-Traefik.md` → **Traefik 配置管理指南(必读)**
|
||||
- `infrastructure/opentofu/environments/dev/` → Terraform 基础设施配置
|
||||
- `deployment/ansible/inventories/production/hosts` → 服务器清单
|
||||
- `README-Vault.md` → Vault 配置和使用说明
|
||||
- `README-Waypoint.md` → Waypoint 配置和使用说明
|
||||
- `README-Backup.md` → 备份管理和恢复说明
|
||||
- `nomad-jobs/vault-cluster.nomad` → Vault Nomad 作业配置
|
||||
- `waypoint-server.nomad` → Waypoint Nomad 作业配置
|
||||
|
||||
---
|
||||
|
||||
## 🔧 服务初始化说明
|
||||
|
||||
### Vault 初始化
|
||||
|
||||
**当前状态:** Vault使用本地file存储,需要初始化
|
||||
|
||||
**初始化步骤:**
|
||||
```bash
|
||||
# 1. 检查vault状态
|
||||
curl -s http://warden.tailnet-68f9.ts.net:8200/v1/sys/health
|
||||
|
||||
# 2. 初始化vault(如果返回"no available server")
|
||||
vault operator init -address=http://warden.tailnet-68f9.ts.net:8200
|
||||
|
||||
# 3. 保存unseal keys和root token
|
||||
# 4. 解封vault
|
||||
vault operator unseal -address=http://warden.tailnet-68f9.ts.net:8200 <unseal-key-1>
|
||||
vault operator unseal -address=http://warden.tailnet-68f9.ts.net:8200 <unseal-key-2>
|
||||
vault operator unseal -address=http://warden.tailnet-68f9.ts.net:8200 <unseal-key-3>
|
||||
```
|
||||
|
||||
**🔑 Vault 密钥信息 (2025-10-04 最终初始化):**
|
||||
```
|
||||
Unseal Key 1: 5XQ6vSekewZj9SigcIS8KcpnsOyEzgG5UFe/mqPVXkre
|
||||
Unseal Key 2: vmLu+Ry+hajWjQhX3YVnZG72aZRn5cowcUm5JIVtv/kR
|
||||
Unseal Key 3: 3eDhfnHZnG9OT6RFOhpoK/aO5TghPypz4XPlXxFMm52F
|
||||
Unseal Key 4: LWGkYB7qD3GPPc/nRuqKmMUiQex8ygYF1BkSXA1Tov3J
|
||||
Unseal Key 5: rIidFy7d/SxcPOCrNy569VZ86I56oMQxqL7qVgM+PYPy
|
||||
|
||||
Root Token: hvs.OgVR2hEihbHM7qFxtFr7oeo3
|
||||
```
|
||||
|
||||
**配置说明:**
|
||||
- **存储**: file (本地文件系统)
|
||||
- **路径**: `/opt/nomad/data/vault-storage` (持久化存储)
|
||||
- **端口**: 8200
|
||||
- **UI**: 启用
|
||||
- **重要**: 已配置持久化存储,重启后密钥不会丢失
|
||||
|
||||
### Waypoint 初始化
|
||||
|
||||
**当前状态:** Waypoint正常运行,可能需要重新初始化
|
||||
|
||||
**初始化步骤:**
|
||||
```bash
|
||||
# 1. 检查waypoint状态
|
||||
curl -I https://waypoint.git-4ta.live
|
||||
|
||||
# 2. 如果需要重新初始化
|
||||
waypoint server init -server-addr=https://waypoint.git-4ta.live
|
||||
|
||||
# 3. 配置waypoint CLI
|
||||
waypoint auth login -server-addr=https://waypoint.git-4ta.live
|
||||
```
|
||||
|
||||
**配置说明:**
|
||||
- **存储**: 本地数据库 `/opt/waypoint/waypoint.db`
|
||||
- **端口**: HTTP 9701, gRPC 9702
|
||||
- **UI**: 启用
|
||||
|
||||
### Consul 服务注册
|
||||
|
||||
**已注册服务:**
|
||||
- ✅ **vault**: `vault.git-4ta.live` (tags: vault, secrets, kv)
|
||||
- ✅ **waypoint**: `waypoint.git-4ta.live` (tags: waypoint, ci-cd, deployment)
|
||||
- ✅ **consul**: `consul.git-4ta.live` (tags: consul, service-discovery)
|
||||
- ✅ **traefik**: `traefik.git-4ta.live` (tags: traefik, proxy, load-balancer)
|
||||
- ✅ **nomad**: `nomad.git-4ta.live` (tags: nomad, scheduler, orchestrator)
|
||||
|
||||
**健康检查:**
|
||||
- **vault**: `/v1/sys/health`
|
||||
- **waypoint**: `/`
|
||||
- **consul**: `/v1/status/leader`
|
||||
- **traefik**: `/ping`
|
||||
- **nomad**: `/v1/status/leader`
|
||||
|
||||
---
|
||||
|
||||
**最后更新:** 2025-10-08 02:55 UTC
|
||||
**状态:** 服务运行正常,Traefik配置架构已优化,Authentik已集成
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
[defaults]
|
||||
inventory = inventory/hosts.yml
|
||||
host_key_checking = False
|
||||
timeout = 30
|
||||
gathering = smart
|
||||
fact_caching = memory
|
||||
|
||||
[ssh_connection]
|
||||
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no
|
||||
pipelining = True
|
||||
|
|
@ -0,0 +1,106 @@
|
|||
---
|
||||
# Ansible Playbook: 部署 Consul Client 到所有 Nomad 节点
|
||||
- name: Deploy Consul Client to Nomad nodes
|
||||
hosts: nomad_clients:nomad_servers
|
||||
become: yes
|
||||
vars:
|
||||
consul_version: "1.21.5"
|
||||
consul_datacenter: "dc1"
|
||||
consul_servers:
|
||||
- "100.117.106.136:8300" # master (韩国)
|
||||
- "100.122.197.112:8300" # warden (北京)
|
||||
- "100.116.80.94:8300" # ash3c (美国)
|
||||
|
||||
tasks:
|
||||
- name: Update APT cache (忽略 GPG 错误)
|
||||
apt:
|
||||
update_cache: yes
|
||||
force_apt_get: yes
|
||||
ignore_errors: yes
|
||||
|
||||
- name: Install consul via APT (假设源已存在)
|
||||
apt:
|
||||
name: consul={{ consul_version }}-*
|
||||
state: present
|
||||
force_apt_get: yes
|
||||
ignore_errors: yes
|
||||
|
||||
- name: Create consul user (if not exists)
|
||||
user:
|
||||
name: consul
|
||||
system: yes
|
||||
shell: /bin/false
|
||||
home: /opt/consul
|
||||
create_home: yes
|
||||
|
||||
- name: Create consul directories
|
||||
file:
|
||||
path: "{{ item }}"
|
||||
state: directory
|
||||
owner: consul
|
||||
group: consul
|
||||
mode: '0755'
|
||||
loop:
|
||||
- /opt/consul
|
||||
- /opt/consul/data
|
||||
- /etc/consul.d
|
||||
- /var/log/consul
|
||||
|
||||
- name: Get node Tailscale IP
|
||||
shell: ip addr show tailscale0 | grep 'inet ' | awk '{print $2}' | cut -d'/' -f1
|
||||
register: tailscale_ip
|
||||
failed_when: tailscale_ip.stdout == ""
|
||||
|
||||
- name: Create consul client configuration
|
||||
template:
|
||||
src: templates/consul-client.hcl.j2
|
||||
dest: /etc/consul.d/consul.hcl
|
||||
owner: consul
|
||||
group: consul
|
||||
mode: '0644'
|
||||
notify: restart consul
|
||||
|
||||
- name: Create consul systemd service
|
||||
template:
|
||||
src: templates/consul.service.j2
|
||||
dest: /etc/systemd/system/consul.service
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
notify: reload systemd
|
||||
|
||||
- name: Enable and start consul service
|
||||
systemd:
|
||||
name: consul
|
||||
enabled: yes
|
||||
state: started
|
||||
notify: restart consul
|
||||
|
||||
- name: Wait for consul to be ready
|
||||
uri:
|
||||
url: "http://{{ tailscale_ip.stdout }}:8500/v1/status/leader"
|
||||
status_code: 200
|
||||
timeout: 5
|
||||
register: consul_leader_status
|
||||
until: consul_leader_status.status == 200
|
||||
retries: 30
|
||||
delay: 5
|
||||
|
||||
- name: Verify consul cluster membership
|
||||
shell: consul members -status=alive -format=json | jq -r '.[].Name'
|
||||
register: consul_members
|
||||
changed_when: false
|
||||
|
||||
- name: Display cluster status
|
||||
debug:
|
||||
msg: "Node {{ inventory_hostname.split('.')[0] }} joined cluster with {{ consul_members.stdout_lines | length }} members"
|
||||
|
||||
handlers:
|
||||
- name: reload systemd
|
||||
systemd:
|
||||
daemon_reload: yes
|
||||
|
||||
- name: restart consul
|
||||
systemd:
|
||||
name: consul
|
||||
state: restarted
|
||||
|
|
@ -0,0 +1,198 @@
|
|||
---
|
||||
# Ansible Playbook: 修复 warden 节点的 zsh 配置
|
||||
- name: Fix zsh configuration on warden node
|
||||
hosts: warden
|
||||
become: yes
|
||||
vars:
|
||||
target_user: ben # 或者你想修复的用户名
|
||||
|
||||
tasks:
|
||||
- name: 检查当前 shell
|
||||
shell: echo $SHELL
|
||||
register: current_shell
|
||||
changed_when: false
|
||||
|
||||
- name: 显示当前 shell
|
||||
debug:
|
||||
msg: "当前 shell: {{ current_shell.stdout }}"
|
||||
|
||||
- name: 检查 zsh 是否已安装
|
||||
package:
|
||||
name: zsh
|
||||
state: present
|
||||
|
||||
- name: 备份现有的 zsh 配置文件
|
||||
shell: |
|
||||
if [ -f ~/.zshrc ]; then
|
||||
cp ~/.zshrc ~/.zshrc.backup.$(date +%Y%m%d_%H%M%S)
|
||||
echo "已备份 ~/.zshrc"
|
||||
fi
|
||||
if [ -f ~/.zsh_history ]; then
|
||||
cp ~/.zsh_history ~/.zsh_history.backup.$(date +%Y%m%d_%H%M%S)
|
||||
echo "已备份 ~/.zsh_history"
|
||||
fi
|
||||
register: backup_result
|
||||
changed_when: backup_result.stdout != ""
|
||||
|
||||
- name: 显示备份结果
|
||||
debug:
|
||||
msg: "{{ backup_result.stdout_lines }}"
|
||||
when: backup_result.stdout != ""
|
||||
|
||||
- name: 检查 oh-my-zsh 是否存在
|
||||
stat:
|
||||
path: ~/.oh-my-zsh
|
||||
register: ohmyzsh_exists
|
||||
|
||||
- name: 重新安装 oh-my-zsh (如果损坏)
|
||||
shell: |
|
||||
if [ -d ~/.oh-my-zsh ]; then
|
||||
rm -rf ~/.oh-my-zsh
|
||||
fi
|
||||
sh -c "$(curl -fsSL https://raw.githubusercontent.com/ohmyzsh/ohmyzsh/master/tools/install.sh)" "" --unattended
|
||||
when: not ohmyzsh_exists.stat.exists or ansible_check_mode == false
|
||||
|
||||
- name: 创建基本的 .zshrc 配置
|
||||
copy:
|
||||
content: |
|
||||
# Path to your oh-my-zsh installation.
|
||||
export ZSH="$HOME/.oh-my-zsh"
|
||||
|
||||
# Set name of the theme to load
|
||||
ZSH_THEME="robbyrussell"
|
||||
|
||||
# Which plugins would you like to load?
|
||||
plugins=(git docker docker-compose kubectl)
|
||||
|
||||
source $ZSH/oh-my-zsh.sh
|
||||
|
||||
# User configuration
|
||||
export PATH=$PATH:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin
|
||||
|
||||
# Aliases
|
||||
alias ll='ls -alF'
|
||||
alias la='ls -A'
|
||||
alias l='ls -CF'
|
||||
alias ..='cd ..'
|
||||
alias ...='cd ../..'
|
||||
|
||||
# Nomad/Consul aliases
|
||||
alias nomad-status='nomad status'
|
||||
alias consul-members='consul members'
|
||||
|
||||
# History settings
|
||||
HISTSIZE=10000
|
||||
SAVEHIST=10000
|
||||
setopt HIST_IGNORE_DUPS
|
||||
setopt HIST_IGNORE_SPACE
|
||||
setopt HIST_VERIFY
|
||||
setopt SHARE_HISTORY
|
||||
dest: ~/.zshrc
|
||||
owner: "{{ target_user }}"
|
||||
group: "{{ target_user }}"
|
||||
mode: '0644'
|
||||
backup: yes
|
||||
|
||||
- name: 设置 zsh 为默认 shell
|
||||
user:
|
||||
name: "{{ target_user }}"
|
||||
shell: /usr/bin/zsh
|
||||
|
||||
- name: 检查 zsh 配置语法
|
||||
shell: zsh -n ~/.zshrc
|
||||
register: zsh_syntax_check
|
||||
failed_when: zsh_syntax_check.rc != 0
|
||||
changed_when: false
|
||||
|
||||
- name: 测试 zsh 启动
|
||||
shell: zsh -c "echo 'zsh 配置测试成功'"
|
||||
register: zsh_test
|
||||
changed_when: false
|
||||
|
||||
- name: 显示修复结果
|
||||
debug:
|
||||
msg:
|
||||
- "zsh 配置修复完成"
|
||||
- "语法检查: {{ 'PASS' if zsh_syntax_check.rc == 0 else 'FAIL' }}"
|
||||
- "启动测试: {{ zsh_test.stdout }}"
|
||||
|
||||
- name: 清理损坏的历史文件
|
||||
shell: |
|
||||
if [ -f ~/.zsh_history ]; then
|
||||
# 尝试修复历史文件
|
||||
strings ~/.zsh_history > ~/.zsh_history.clean
|
||||
mv ~/.zsh_history.clean ~/.zsh_history
|
||||
echo "已清理 zsh 历史文件"
|
||||
fi
|
||||
register: history_cleanup
|
||||
changed_when: history_cleanup.stdout != ""
|
||||
|
||||
- name: 修复 DNS 配置问题
|
||||
shell: |
|
||||
# 备份现有DNS配置
|
||||
sudo cp /etc/resolv.conf /etc/resolv.conf.backup.$(date +%Y%m%d_%H%M%S)
|
||||
|
||||
# 添加备用DNS服务器
|
||||
echo "# 备用DNS服务器配置" | sudo tee -a /etc/resolv.conf
|
||||
echo "nameserver 8.8.8.8" | sudo tee -a /etc/resolv.conf
|
||||
echo "nameserver 8.8.4.4" | sudo tee -a /etc/resolv.conf
|
||||
echo "nameserver 1.1.1.1" | sudo tee -a /etc/resolv.conf
|
||||
|
||||
echo "已添加备用DNS服务器"
|
||||
register: dns_fix
|
||||
changed_when: dns_fix.stdout != ""
|
||||
|
||||
- name: 测试 DNS 修复
|
||||
shell: nslookup github.com
|
||||
register: dns_test
|
||||
changed_when: false
|
||||
|
||||
- name: 显示 DNS 测试结果
|
||||
debug:
|
||||
msg: "{{ dns_test.stdout_lines }}"
|
||||
|
||||
- name: 修复 zsh completion 权限问题
|
||||
shell: |
|
||||
# 修复系统 completion 目录权限
|
||||
sudo chown -R root:root /usr/share/zsh/vendor-completions/ 2>/dev/null || true
|
||||
sudo chown -R root:root /usr/share/bash-completion/ 2>/dev/null || true
|
||||
sudo chown -R root:root /usr/share/fish/vendor_completions.d/ 2>/dev/null || true
|
||||
sudo chown -R root:root /usr/local/share/zsh/site-functions/ 2>/dev/null || true
|
||||
|
||||
# 设置正确的权限
|
||||
sudo chmod -R 755 /usr/share/zsh/vendor-completions/ 2>/dev/null || true
|
||||
sudo chmod -R 755 /usr/share/bash-completion/ 2>/dev/null || true
|
||||
sudo chmod -R 755 /usr/share/fish/vendor_completions.d/ 2>/dev/null || true
|
||||
sudo chmod -R 755 /usr/local/share/zsh/site-functions/ 2>/dev/null || true
|
||||
|
||||
# 修复 oh-my-zsh completion 目录权限(如果存在)
|
||||
if [ -d ~/.oh-my-zsh ]; then
|
||||
chmod -R 755 ~/.oh-my-zsh/completions
|
||||
chmod -R 755 ~/.oh-my-zsh/plugins
|
||||
chmod -R 755 ~/.oh-my-zsh/lib
|
||||
echo "已修复 oh-my-zsh 目录权限"
|
||||
fi
|
||||
|
||||
# 重新生成 completion 缓存
|
||||
rm -f ~/.zcompdump* 2>/dev/null || true
|
||||
echo "已修复系统 completion 目录权限并清理缓存"
|
||||
register: completion_fix
|
||||
changed_when: completion_fix.stdout != ""
|
||||
|
||||
- name: 显示 completion 修复结果
|
||||
debug:
|
||||
msg: "{{ completion_fix.stdout_lines }}"
|
||||
when: completion_fix.stdout != ""
|
||||
|
||||
- name: 测试 zsh completion 修复
|
||||
shell: zsh -c "autoload -U compinit && compinit -D && echo 'completion 系统修复成功'"
|
||||
register: completion_test
|
||||
changed_when: false
|
||||
|
||||
- name: 重新加载 zsh 配置提示
|
||||
debug:
|
||||
msg:
|
||||
- "修复完成!请执行以下命令重新加载配置:"
|
||||
- "source ~/.zshrc"
|
||||
- "或者重新登录以使用新的 shell 配置"
|
||||
- "completion 权限问题已修复"
|
||||
|
|
@ -0,0 +1,10 @@
|
|||
---
|
||||
all:
|
||||
children:
|
||||
warden:
|
||||
hosts:
|
||||
warden:
|
||||
ansible_host: 100.122.197.112
|
||||
ansible_user: ben
|
||||
ansible_password: "3131"
|
||||
ansible_become_password: "3131"
|
||||
|
|
@ -0,0 +1,61 @@
|
|||
# Consul Client Configuration for {{ inventory_hostname }}
|
||||
datacenter = "{{ consul_datacenter }}"
|
||||
data_dir = "/opt/consul/data"
|
||||
log_level = "INFO"
|
||||
node_name = "{{ inventory_hostname.split('.')[0] }}"
|
||||
bind_addr = "{{ tailscale_ip.stdout }}"
|
||||
|
||||
# Client mode (not server)
|
||||
server = false
|
||||
|
||||
# Connect to Consul servers (指向三节点集群)
|
||||
retry_join = [
|
||||
"100.117.106.136", # master (韩国)
|
||||
"100.122.197.112", # warden (北京)
|
||||
"100.116.80.94" # ash3c (美国)
|
||||
]
|
||||
|
||||
# Performance optimization
|
||||
performance {
|
||||
raft_multiplier = 5
|
||||
}
|
||||
|
||||
# Ports configuration
|
||||
ports {
|
||||
grpc = 8502
|
||||
http = 8500
|
||||
dns = 8600
|
||||
}
|
||||
|
||||
# Enable Connect for service mesh
|
||||
connect {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# Cache configuration for performance
|
||||
cache {
|
||||
entry_fetch_max_burst = 42
|
||||
entry_fetch_rate = 30
|
||||
}
|
||||
|
||||
# Node metadata
|
||||
node_meta = {
|
||||
region = "{{ region | default('unknown') }}"
|
||||
zone = "nomad-server"
|
||||
}
|
||||
|
||||
# UI disabled for clients
|
||||
ui_config {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
# ACL configuration (if needed)
|
||||
acl = {
|
||||
enabled = false
|
||||
default_policy = "allow"
|
||||
}
|
||||
|
||||
# Logging
|
||||
log_file = "/var/log/consul/consul.log"
|
||||
log_rotate_duration = "24h"
|
||||
log_rotate_max_files = 7
|
||||
|
|
@ -0,0 +1,26 @@
|
|||
[Unit]
|
||||
Description=Consul Client
|
||||
Documentation=https://www.consul.io/
|
||||
Requires=network-online.target
|
||||
After=network-online.target
|
||||
ConditionFileNotEmpty=/etc/consul.d/consul.hcl
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
User=consul
|
||||
Group=consul
|
||||
ExecStart=/usr/bin/consul agent -config-dir=/etc/consul.d
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
LimitNOFILE=65536
|
||||
|
||||
# Security settings
|
||||
NoNewPrivileges=yes
|
||||
PrivateTmp=yes
|
||||
ProtectHome=yes
|
||||
ProtectSystem=strict
|
||||
ReadWritePaths=/opt/consul /var/log/consul
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
|
|
@ -0,0 +1,99 @@
|
|||
# Nomad Jobs 备份
|
||||
|
||||
**备份时间**: 2025-10-04 07:44:11
|
||||
**备份原因**: 所有服务正常运行,SSL证书已配置完成
|
||||
|
||||
## 当前运行状态
|
||||
|
||||
### ✅ 已部署并正常工作的服务
|
||||
|
||||
1. **Traefik** (`traefik-cloudflare-v1`)
|
||||
- 文件: `components/traefik/jobs/traefik-cloudflare.nomad`
|
||||
- 状态: 运行中,SSL证书正常
|
||||
- 域名: `*.git4ta.me`
|
||||
- 证书: Let's Encrypt (Cloudflare DNS Challenge)
|
||||
|
||||
2. **Vault** (`vault-cluster`)
|
||||
- 文件: `nomad-jobs/vault-cluster.nomad`
|
||||
- 状态: 三节点集群运行中
|
||||
- 节点: ch4, ash3c, warden
|
||||
- 配置: 存储在 Consul KV `vault/config`
|
||||
|
||||
3. **Waypoint** (`waypoint-server`)
|
||||
- 文件: `waypoint-server.nomad`
|
||||
- 状态: 运行中
|
||||
- 节点: hcp1
|
||||
- Web UI: `https://waypoint.git4ta.me/auth/token`
|
||||
|
||||
### 🔧 关键配置
|
||||
|
||||
#### Traefik 配置要点
|
||||
- 使用 Cloudflare DNS Challenge 获取 SSL 证书
|
||||
- 证书存储: `/local/acme.json` (本地存储)
|
||||
- 域名: `git4ta.me`
|
||||
- 服务路由: consul, nomad, vault, waypoint
|
||||
|
||||
#### Vault 配置要点
|
||||
- 三节点高可用集群
|
||||
- 配置统一存储在 Consul KV
|
||||
- 使用 `exec` driver
|
||||
- 服务注册到 Consul
|
||||
|
||||
#### Waypoint 配置要点
|
||||
- 使用 `raw_exec` driver
|
||||
- HTTPS API: 9701, gRPC: 9702
|
||||
- 已引导并获取认证 token
|
||||
|
||||
### 📋 服务端点
|
||||
|
||||
- `https://consul.git4ta.me` → Consul UI
|
||||
- `https://traefik.git4ta.me` → Traefik Dashboard
|
||||
- `https://nomad.git4ta.me` → Nomad UI
|
||||
- `https://vault.git4ta.me` → Vault UI
|
||||
- `https://waypoint.git4ta.me/auth/token` → Waypoint UI
|
||||
|
||||
### 🔑 重要凭据
|
||||
|
||||
#### Vault
|
||||
- Unseal Keys: 存储在 Consul KV `vault/unseal-keys`
|
||||
- Root Token: 存储在 Consul KV `vault/root-token`
|
||||
- 详细文档: `/root/mgmt/README-Vault.md`
|
||||
|
||||
#### Waypoint
|
||||
- Auth Token: 存储在 Consul KV `waypoint/auth-token`
|
||||
- 详细文档: `/root/mgmt/README-Waypoint.md`
|
||||
|
||||
### 🚀 部署命令
|
||||
|
||||
```bash
|
||||
# 部署 Traefik
|
||||
nomad job run components/traefik/jobs/traefik-cloudflare.nomad
|
||||
|
||||
# 部署 Vault
|
||||
nomad job run nomad-jobs/vault-cluster.nomad
|
||||
|
||||
# 部署 Waypoint
|
||||
nomad job run waypoint-server.nomad
|
||||
```
|
||||
|
||||
### 📝 注意事项
|
||||
|
||||
1. **证书管理**: 证书存储在 Traefik 容器的 `/local/acme.json`,容器重启会丢失
|
||||
2. **Vault 配置**: 所有配置通过 Consul KV 动态加载,修改后需要重启 job
|
||||
3. **网络配置**: 所有服务使用 Tailscale 网络地址
|
||||
4. **备份策略**: 建议定期备份 Consul KV 中的配置和凭据
|
||||
|
||||
### 🔄 恢复步骤
|
||||
|
||||
如需恢复到此状态:
|
||||
|
||||
1. 恢复 Consul KV 配置
|
||||
2. 按顺序部署: Traefik → Vault → Waypoint
|
||||
3. 验证所有服务端点可访问
|
||||
4. 检查 SSL 证书状态
|
||||
|
||||
---
|
||||
|
||||
**备份完成时间**: 2025-10-04 07:44:11
|
||||
**备份者**: AI Assistant
|
||||
**状态**: 所有服务正常运行 ✅
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
# Consul 配置
|
||||
|
||||
## 部署
|
||||
|
||||
```bash
|
||||
nomad job run components/consul/jobs/consul-cluster.nomad
|
||||
```
|
||||
|
||||
## Job 信息
|
||||
|
||||
- **Job 名称**: `consul-cluster-nomad`
|
||||
- **类型**: service
|
||||
- **节点**: master, ash3c, warden
|
||||
|
||||
## 访问方式
|
||||
|
||||
- Master: `http://master.tailnet-68f9.ts.net:8500`
|
||||
- Ash3c: `http://ash3c.tailnet-68f9.ts.net:8500`
|
||||
- Warden: `http://warden.tailnet-68f9.ts.net:8500`
|
||||
|
|
@ -0,0 +1,88 @@
|
|||
# Consul配置文件
|
||||
# 此文件包含Consul的完整配置,包括变量和存储相关设置
|
||||
|
||||
# 基础配置
|
||||
data_dir = "/opt/consul/data"
|
||||
raft_dir = "/opt/consul/raft"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "dc1"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = 3
|
||||
|
||||
# 网络配置
|
||||
client_addr = "0.0.0.0"
|
||||
bind_addr = "{{ GetInterfaceIP `eth0` }}"
|
||||
advertise_addr = "{{ GetInterfaceIP `eth0` }}"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = 8600
|
||||
http = 8500
|
||||
https = -1
|
||||
grpc = 8502
|
||||
grpc_tls = 8503
|
||||
serf_lan = 8301
|
||||
serf_wan = 8302
|
||||
server = 8300
|
||||
}
|
||||
|
||||
# 集群连接
|
||||
retry_join = ["100.117.106.136", "100.116.80.94", "100.122.197.112"]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = true
|
||||
enable_script_checks = true
|
||||
enable_local_script_checks = true
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = 1
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "INFO"
|
||||
enable_syslog = false
|
||||
log_file = "/var/log/consul/consul.log"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "YourEncryptionKeyHere"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "30s"
|
||||
reconnect_timeout_wan = "30s"
|
||||
session_ttl_min = "10s"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = true
|
||||
last_contact_threshold = "200ms"
|
||||
max_trailing_logs = 250
|
||||
server_stabilization_time = "10s"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = false
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = true
|
||||
interval = "24h"
|
||||
retain = 30
|
||||
name = "consul-snapshot-{{.Timestamp}}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = true
|
||||
interval = "6h"
|
||||
retain = 7
|
||||
name = "consul-backup-{{.Timestamp}}"
|
||||
}
|
||||
|
|
@ -0,0 +1,93 @@
|
|||
# Consul配置模板文件
|
||||
# 此文件使用Consul模板语法从KV存储中动态获取配置
|
||||
# 遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
|
||||
|
||||
# 基础配置
|
||||
data_dir = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/data_dir` `/opt/consul/data` }}"
|
||||
raft_dir = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/raft_dir` `/opt/consul/raft` }}"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ui/enabled` `true` }}
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/datacenter` `dc1` }}"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/bootstrap_expect` `3` }}
|
||||
|
||||
# 网络配置
|
||||
client_addr = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/network/client_addr` `0.0.0.0` }}"
|
||||
bind_addr = "{{ GetInterfaceIP (keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/network/bind_interface` `ens160`) }}"
|
||||
advertise_addr = "{{ GetInterfaceIP (keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/network/advertise_interface` `ens160`) }}"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/dns` `8600` }}
|
||||
http = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/http` `8500` }}
|
||||
https = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/https` `-1` }}
|
||||
grpc = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/grpc` `8502` }}
|
||||
grpc_tls = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/grpc_tls` `8503` }}
|
||||
serf_lan = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/serf_lan` `8301` }}
|
||||
serf_wan = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/serf_wan` `8302` }}
|
||||
server = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/server` `8300` }}
|
||||
}
|
||||
|
||||
# 集群连接 - 动态获取节点IP
|
||||
retry_join = [
|
||||
"{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/nodes/master/ip` `100.117.106.136` }}",
|
||||
"{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/nodes/ash3c/ip` `100.116.80.94` }}",
|
||||
"{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/nodes/warden/ip` `100.122.197.112` }}"
|
||||
]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/service/enable_service_script` `true` }}
|
||||
enable_script_checks = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/service/enable_script_checks` `true` }}
|
||||
enable_local_script_checks = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/service/enable_local_script_checks` `true` }}
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/performance/raft_multiplier` `1` }}
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/log_level` `INFO` }}"
|
||||
enable_syslog = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/log/enable_syslog` `false` }}
|
||||
log_file = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/log/log_file` `/var/log/consul/consul.log` }}"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/encrypt_key` `YourEncryptionKeyHere` }}"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/connection/reconnect_timeout` `30s` }}"
|
||||
reconnect_timeout_wan = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/connection/reconnect_timeout_wan` `30s` }}"
|
||||
session_ttl_min = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/connection/session_ttl_min` `10s` }}"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/cleanup_dead_servers` `true` }}
|
||||
last_contact_threshold = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/last_contact_threshold` `200ms` }}"
|
||||
max_trailing_logs = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/max_trailing_logs` `250` }}
|
||||
server_stabilization_time = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/server_stabilization_time` `10s` }}"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/disable_upgrade_migration` `false` }}
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/interval` `24h` }}"
|
||||
retain = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/retain` `30` }}
|
||||
name = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/name` `consul-snapshot-{{.Timestamp}}` }}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/interval` `6h` }}"
|
||||
retain = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/retain` `7` }}
|
||||
name = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/name` `consul-backup-{{.Timestamp}}` }}"
|
||||
}
|
||||
|
|
@ -0,0 +1,50 @@
|
|||
job "consul-clients-additional" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "ch2|ch3|de"
|
||||
}
|
||||
|
||||
group "consul-client" {
|
||||
count = 3
|
||||
|
||||
task "consul-client" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/bin/consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-config-dir=/etc/consul.d",
|
||||
"-data-dir=/opt/consul",
|
||||
"-node=${node.unique.name}",
|
||||
"-bind=${attr.unique.network.ip-address}",
|
||||
"-retry-join=warden.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
|
||||
"-client=0.0.0.0"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
|
||||
service {
|
||||
name = "consul-client"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/v1/status/leader"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,154 @@
|
|||
job "consul-clients-dedicated" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "consul-client-hcp1" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
}
|
||||
|
||||
task "consul-client" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/bin/consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-data-dir=/opt/consul",
|
||||
"-node=hcp1",
|
||||
"-bind=100.97.62.111",
|
||||
"-advertise=100.97.62.111",
|
||||
"-retry-join=hcp1.tailnet-68f9.ts.net:80",
|
||||
"-client=0.0.0.0",
|
||||
"-http-port=8500",
|
||||
"-datacenter=dc1"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
|
||||
service {
|
||||
name = "consul-client"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
type = "script"
|
||||
command = "consul"
|
||||
args = ["members"]
|
||||
interval = "10s"
|
||||
timeout = "3s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-client-influxdb1" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "influxdb1"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
}
|
||||
|
||||
task "consul-client" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/bin/consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-data-dir=/opt/consul",
|
||||
"-node=influxdb1",
|
||||
"-bind=100.100.7.4",
|
||||
"-advertise=100.100.7.4",
|
||||
"-retry-join=hcp1.tailnet-68f9.ts.net:80",
|
||||
"-client=0.0.0.0",
|
||||
"-http-port=8500",
|
||||
"-datacenter=dc1"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
|
||||
service {
|
||||
name = "consul-client"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
type = "script"
|
||||
command = "consul"
|
||||
args = ["members"]
|
||||
interval = "10s"
|
||||
timeout = "3s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-client-browser" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "browser"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
}
|
||||
|
||||
task "consul-client" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/bin/consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-data-dir=/opt/consul",
|
||||
"-node=browser",
|
||||
"-bind=100.116.112.45",
|
||||
"-advertise=100.116.112.45",
|
||||
"-retry-join=hcp1.tailnet-68f9.ts.net:80",
|
||||
"-client=0.0.0.0",
|
||||
"-http-port=8500",
|
||||
"-datacenter=dc1"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
|
||||
service {
|
||||
name = "consul-client"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
type = "script"
|
||||
command = "consul"
|
||||
args = ["members"]
|
||||
interval = "10s"
|
||||
timeout = "3s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,66 @@
|
|||
job "consul-clients-dedicated" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "hcp1|influxdb1|browser"
|
||||
}
|
||||
|
||||
group "consul-client" {
|
||||
count = 3
|
||||
|
||||
update {
|
||||
max_parallel = 3
|
||||
min_healthy_time = "5s"
|
||||
healthy_deadline = "2m"
|
||||
progress_deadline = "5m"
|
||||
auto_revert = false
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
}
|
||||
|
||||
task "consul-client" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/bin/consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-data-dir=/opt/consul",
|
||||
"-node=${node.unique.name}",
|
||||
"-bind=${attr.unique.network.ip-address}",
|
||||
"-advertise=${attr.unique.network.ip-address}",
|
||||
"-retry-join=warden.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
|
||||
"-client=0.0.0.0",
|
||||
"-http-port=${NOMAD_PORT_http}",
|
||||
"-datacenter=dc1"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
|
||||
service {
|
||||
name = "consul-client"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/v1/status/leader"
|
||||
interval = "10s"
|
||||
timeout = "3s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,43 @@
|
|||
job "consul-clients" {
|
||||
datacenters = ["dc1"]
|
||||
type = "system"
|
||||
|
||||
group "consul-client" {
|
||||
count = 0 # system job, runs on all nodes
|
||||
|
||||
task "consul-client" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/bin/consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-config-dir=/etc/consul.d",
|
||||
"-data-dir=/opt/consul",
|
||||
"-node=${node.unique.name}",
|
||||
"-bind=${attr.unique.network.ip-address}",
|
||||
"-retry-join=warden.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=ash3c.tailnet-68f9.ts.net:8301"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
|
||||
service {
|
||||
name = "consul-client"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/v1/status/leader"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,115 @@
|
|||
job "consul-cluster-nomad" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "consul-ch4" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "ch4"
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=100.117.106.136",
|
||||
"-advertise=100.117.106.136",
|
||||
"-retry-join=100.116.80.94",
|
||||
"-retry-join=100.122.197.112",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-ash3c" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "ash3c"
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=100.116.80.94",
|
||||
"-advertise=100.116.80.94",
|
||||
"-retry-join=100.117.106.136",
|
||||
"-retry-join=100.122.197.112",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-warden" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=100.122.197.112",
|
||||
"-advertise=100.122.197.112",
|
||||
"-retry-join=100.117.106.136",
|
||||
"-retry-join=100.116.80.94",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,66 @@
|
|||
job "consul-ui-service" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "consul-ui" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 8500
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "consul-ui"
|
||||
port = "http"
|
||||
|
||||
tags = [
|
||||
"traefik.enable=true",
|
||||
"traefik.http.routers.consul-ui.rule=PathPrefix(`/consul`)",
|
||||
"traefik.http.routers.consul-ui.priority=100"
|
||||
]
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/v1/status/leader"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
|
||||
task "consul-ui" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/bin/consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind=100.122.197.112",
|
||||
"-advertise=100.122.197.112",
|
||||
"-retry-join=100.117.106.136",
|
||||
"-retry-join=100.116.80.94",
|
||||
"-ui",
|
||||
"-http-port=8500"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
# Nomad 配置
|
||||
|
||||
## Jobs
|
||||
|
||||
- `install-podman-driver.nomad` - 安装 Podman 驱动
|
||||
- `nomad-consul-config.nomad` - Nomad-Consul 配置
|
||||
- `nomad-consul-setup.nomad` - Nomad-Consul 设置
|
||||
- `nomad-nfs-volume.nomad` - NFS 卷配置
|
||||
|
|
@ -0,0 +1,110 @@
|
|||
job "install-podman-driver" {
|
||||
datacenters = ["dc1"]
|
||||
type = "system" # 在所有节点上运行
|
||||
|
||||
group "install" {
|
||||
task "install-podman" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "bash"
|
||||
args = [
|
||||
"-c",
|
||||
<<-EOF
|
||||
set -euo pipefail
|
||||
export PATH="/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin"
|
||||
|
||||
# 依赖工具
|
||||
if ! command -v jq >/dev/null 2>&1 || ! command -v unzip >/dev/null 2>&1 || ! command -v wget >/dev/null 2>&1; then
|
||||
echo "Installing dependencies (jq unzip wget)..."
|
||||
sudo -n apt update -y || true
|
||||
sudo -n apt install -y jq unzip wget || true
|
||||
fi
|
||||
|
||||
# 安装 Podman(若未安装)
|
||||
if ! command -v podman >/dev/null 2>&1; then
|
||||
echo "Installing Podman..."
|
||||
sudo -n apt update -y || true
|
||||
sudo -n apt install -y podman || true
|
||||
sudo -n systemctl enable podman || true
|
||||
else
|
||||
echo "Podman already installed"
|
||||
fi
|
||||
|
||||
# 启用并启动 podman.socket,确保 Nomad 可访问
|
||||
sudo -n systemctl enable --now podman.socket || true
|
||||
if getent group podman >/dev/null 2>&1; then
|
||||
sudo -n usermod -aG podman nomad || true
|
||||
fi
|
||||
|
||||
# 安装 Nomad Podman 驱动插件(始终确保存在)
|
||||
PODMAN_DRIVER_VERSION="0.6.1"
|
||||
PLUGIN_DIR="/opt/nomad/data/plugins"
|
||||
sudo -n mkdir -p "${PLUGIN_DIR}" || true
|
||||
cd /tmp
|
||||
if [ ! -x "${PLUGIN_DIR}/nomad-driver-podman" ]; then
|
||||
echo "Installing nomad-driver-podman ${PODMAN_DRIVER_VERSION}..."
|
||||
wget -q "https://releases.hashicorp.com/nomad-driver-podman/${PODMAN_DRIVER_VERSION}/nomad-driver-podman_${PODMAN_DRIVER_VERSION}_linux_amd64.zip"
|
||||
unzip -o "nomad-driver-podman_${PODMAN_DRIVER_VERSION}_linux_amd64.zip"
|
||||
sudo -n mv -f nomad-driver-podman "${PLUGIN_DIR}/"
|
||||
sudo -n chmod +x "${PLUGIN_DIR}/nomad-driver-podman"
|
||||
sudo -n chown -R nomad:nomad "${PLUGIN_DIR}"
|
||||
rm -f "nomad-driver-podman_${PODMAN_DRIVER_VERSION}_linux_amd64.zip"
|
||||
else
|
||||
echo "nomad-driver-podman already present in ${PLUGIN_DIR}"
|
||||
fi
|
||||
|
||||
# 更新 /etc/nomad.d/nomad.hcl 的 plugin_dir 设置
|
||||
if [ -f /etc/nomad.d/nomad.hcl ]; then
|
||||
if grep -q "^plugin_dir\s*=\s*\"" /etc/nomad.d/nomad.hcl; then
|
||||
sudo -n sed -i 's#^plugin_dir\s*=\s*\".*\"#plugin_dir = "/opt/nomad/data/plugins"#' /etc/nomad.d/nomad.hcl || true
|
||||
else
|
||||
echo 'plugin_dir = "/opt/nomad/data/plugins"' | sudo -n tee -a /etc/nomad.d/nomad.hcl >/dev/null || true
|
||||
fi
|
||||
fi
|
||||
|
||||
# 重启 Nomad 服务以加载插件
|
||||
sudo -n systemctl restart nomad || true
|
||||
echo "Waiting for Nomad to restart..."
|
||||
sleep 15
|
||||
|
||||
# 检查 Podman 驱动是否被 Nomad 检测到
|
||||
if /usr/local/bin/nomad node status -self -json 2>/dev/null | jq -r '.Drivers.podman.Detected' | grep -q "true"; then
|
||||
echo "Podman driver successfully loaded"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo "Podman driver not detected yet, retrying once after socket restart..."
|
||||
sudo -n systemctl restart podman.socket || true
|
||||
sleep 5
|
||||
if /usr/local/bin/nomad node status -self -json 2>/dev/null | jq -r '.Drivers.podman.Detected' | grep -q "true"; then
|
||||
echo "Podman driver successfully loaded after socket restart"
|
||||
exit 0
|
||||
else
|
||||
echo "Podman driver still not detected; manual investigation may be required"
|
||||
exit 1
|
||||
fi
|
||||
EOF
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 200
|
||||
memory = 256
|
||||
}
|
||||
|
||||
// 以root权限运行
|
||||
// user = "root"
|
||||
# 使用 nomad 用户运行任务,避免客户端策略禁止 root
|
||||
user = "nomad"
|
||||
|
||||
# 确保任务成功完成
|
||||
restart {
|
||||
attempts = 1
|
||||
interval = "24h"
|
||||
delay = "60s"
|
||||
mode = "fail"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,55 @@
|
|||
job "nomad-consul-config" {
|
||||
datacenters = ["dc1"]
|
||||
type = "system"
|
||||
|
||||
group "nomad-server-config" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "semaphore|ash1d|ash2e|ch2|ch3|onecloud1|de"
|
||||
}
|
||||
|
||||
task "update-nomad-config" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "sh"
|
||||
args = [
|
||||
"-c",
|
||||
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "nomad-client-config" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "ch4|ash3c|browser|influxdb1|hcp1|warden"
|
||||
}
|
||||
|
||||
task "update-nomad-config" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "sh"
|
||||
args = [
|
||||
"-c",
|
||||
"sed -i '/^consul {/,/^}/c\\consul {\\n address = \"ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl && systemctl restart nomad"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,23 @@
|
|||
job "nomad-consul-setup" {
|
||||
datacenters = ["dc1"]
|
||||
type = "system"
|
||||
|
||||
group "nomad-config" {
|
||||
task "setup-consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "sh"
|
||||
args = [
|
||||
"-c",
|
||||
"if grep -q 'server.*enabled.*true' /etc/nomad.d/nomad.hcl; then sed -i '/^consul {/,/^}/c\\consul {\\n address = \"ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = true\\n client_auto_join = false\\n}' /etc/nomad.d/nomad.hcl; else sed -i '/^consul {/,/^}/c\\consul {\\n address = \"ch4.tailnet-68f9.ts.net:8500,ash3c.tailnet-68f9.ts.net:8500,warden.tailnet-68f9.ts.net:8500\"\\n server_service_name = \"nomad\"\\n client_service_name = \"nomad-client\"\\n auto_advertise = true\\n server_auto_join = false\\n client_auto_join = true\\n}' /etc/nomad.d/nomad.hcl; fi && systemctl restart nomad"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 128
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,34 @@
|
|||
job "nfs-volume-example" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "nfs-app" {
|
||||
count = 1
|
||||
|
||||
volume "nfs-shared" {
|
||||
type = "host"
|
||||
source = "nfs-shared"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
task "app" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "alpine:latest"
|
||||
args = ["tail", "-f", "/dev/null"]
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
volume = "nfs-shared"
|
||||
destination = "/shared"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 64
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,28 @@
|
|||
# Traefik 配置
|
||||
|
||||
## 部署
|
||||
|
||||
```bash
|
||||
nomad job run components/traefik/jobs/traefik.nomad
|
||||
```
|
||||
|
||||
## 配置特点
|
||||
|
||||
- 明确绑定 Tailscale IP (100.97.62.111)
|
||||
- 地理位置优化的 Consul 集群顺序(北京 → 韩国 → 美国)
|
||||
- 适合跨太平洋网络的宽松健康检查
|
||||
- 无服务健康检查,避免 flapping
|
||||
|
||||
## 访问方式
|
||||
|
||||
- Dashboard: `http://hcp1.tailnet-68f9.ts.net:8080/dashboard/`
|
||||
- 直接 IP: `http://100.97.62.111:8080/dashboard/`
|
||||
- Consul LB: `http://hcp1.tailnet-68f9.ts.net:80`
|
||||
|
||||
## 故障排除
|
||||
|
||||
如果遇到服务 flapping 问题:
|
||||
1. 检查是否使用了 RFC1918 私有地址
|
||||
2. 确认 Tailscale 网络连通性
|
||||
3. 调整健康检查间隔时间
|
||||
4. 考虑地理位置对网络延迟的影响
|
||||
|
|
@ -0,0 +1,28 @@
|
|||
job "test-simple" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "test" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
task "test" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "sleep"
|
||||
args = ["3600"]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 64
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
|
@ -0,0 +1,213 @@
|
|||
job "traefik-cloudflare-v1" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
port "https" {
|
||||
static = 443
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "0.0.0.0:80"
|
||||
http:
|
||||
redirections:
|
||||
entrypoint:
|
||||
to: websecure
|
||||
scheme: https
|
||||
permanent: true
|
||||
websecure:
|
||||
address: "0.0.0.0:443"
|
||||
traefik:
|
||||
address: "0.0.0.0:8080"
|
||||
|
||||
providers:
|
||||
consulCatalog:
|
||||
endpoint:
|
||||
address: "warden.tailnet-68f9.ts.net:8500"
|
||||
scheme: "http"
|
||||
watch: true
|
||||
exposedByDefault: false
|
||||
prefix: "traefik"
|
||||
defaultRule: "Host(`{{ .Name }}.git4ta.me`)"
|
||||
file:
|
||||
filename: /local/dynamic.yml
|
||||
watch: true
|
||||
|
||||
certificatesResolvers:
|
||||
cloudflare:
|
||||
acme:
|
||||
email: houzhongxu.houzhongxu@gmail.com
|
||||
storage: /local/acme.json
|
||||
dnsChallenge:
|
||||
provider: cloudflare
|
||||
delayBeforeCheck: 30s
|
||||
resolvers:
|
||||
- "1.1.1.1:53"
|
||||
- "1.0.0.1:53"
|
||||
|
||||
log:
|
||||
level: DEBUG
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
http:
|
||||
serversTransports:
|
||||
waypoint-insecure:
|
||||
insecureSkipVerify: true
|
||||
|
||||
middlewares:
|
||||
consul-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/consul"
|
||||
waypoint-auth:
|
||||
replacePathRegex:
|
||||
regex: "^/auth/token(.*)$"
|
||||
replacement: "/auth/token$1"
|
||||
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
nomad-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京,优先
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:4646" # 韩国,备用
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国,备用
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
waypoint-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
|
||||
serversTransport: waypoint-insecure
|
||||
|
||||
vault-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8200" # 韩国,活跃节点
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8200" # 美国,备用节点
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,备用节点
|
||||
healthCheck:
|
||||
path: "/v1/sys/health"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-api:
|
||||
rule: "Host(`consul.git4ta.me`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
traefik-dashboard:
|
||||
rule: "Host(`traefik.git4ta.me`)"
|
||||
service: dashboard@internal
|
||||
middlewares:
|
||||
- dashboard_redirect@internal
|
||||
- dashboard_stripprefix@internal
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
nomad-ui:
|
||||
rule: "Host(`nomad.git4ta.me`)"
|
||||
service: nomad-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
waypoint-ui:
|
||||
rule: "Host(`waypoint.git4ta.me`)"
|
||||
service: waypoint-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
vault-ui:
|
||||
rule: "Host(`vault.git4ta.me`)"
|
||||
service: vault-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
CLOUDFLARE_EMAIL=houzhongxu.houzhongxu@gmail.com
|
||||
CLOUDFLARE_DNS_API_TOKEN=HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3
|
||||
CLOUDFLARE_ZONE_API_TOKEN=HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3
|
||||
EOF
|
||||
destination = "local/cloudflare.env"
|
||||
env = true
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,217 @@
|
|||
job "traefik-consul-kv" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "0.0.0.0:80"
|
||||
traefik:
|
||||
address: "0.0.0.0:8080"
|
||||
|
||||
providers:
|
||||
consulCatalog:
|
||||
endpoint:
|
||||
address: "warden.tailnet-68f9.ts.net:8500"
|
||||
scheme: "http"
|
||||
watch: true
|
||||
file:
|
||||
filename: /local/dynamic.yml
|
||||
watch: true
|
||||
|
||||
metrics:
|
||||
prometheus:
|
||||
addEntryPointsLabels: true
|
||||
addServicesLabels: true
|
||||
addRoutersLabels: true
|
||||
|
||||
log:
|
||||
level: INFO
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
http:
|
||||
middlewares:
|
||||
consul-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/consul"
|
||||
|
||||
traefik-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/traefik"
|
||||
|
||||
nomad-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/nomad"
|
||||
|
||||
consul-redirect:
|
||||
redirectRegex:
|
||||
regex: "^/consul/?$"
|
||||
replacement: "/consul/ui/"
|
||||
permanent: false
|
||||
|
||||
nomad-redirect:
|
||||
redirectRegex:
|
||||
regex: "^/nomad/?$"
|
||||
replacement: "/nomad/ui/"
|
||||
permanent: false
|
||||
|
||||
traefik-redirect:
|
||||
redirectRegex:
|
||||
regex: "^/traefik/?$"
|
||||
replacement: "/traefik/dashboard/"
|
||||
permanent: false
|
||||
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
nomad-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch2.tailnet-68f9.ts.net:4646" # Nomad server leader
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-redirect:
|
||||
rule: "Path(`/consul`) || Path(`/consul/`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-redirect
|
||||
entryPoints:
|
||||
- web
|
||||
priority: 100
|
||||
|
||||
consul-ui:
|
||||
rule: "PathPrefix(`/consul/ui`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
priority: 5
|
||||
|
||||
consul-api:
|
||||
rule: "PathPrefix(`/consul/v1`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
priority: 5
|
||||
|
||||
traefik-api:
|
||||
rule: "PathPrefix(`/traefik/api`)"
|
||||
service: api@internal
|
||||
middlewares:
|
||||
- traefik-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
priority: 6
|
||||
|
||||
traefik-dashboard:
|
||||
rule: "PathPrefix(`/traefik/dashboard`)"
|
||||
service: dashboard@internal
|
||||
middlewares:
|
||||
- traefik-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
priority: 5
|
||||
|
||||
traefik-redirect:
|
||||
rule: "Path(`/traefik`) || Path(`/traefik/`)"
|
||||
middlewares:
|
||||
- "traefik-redirect"
|
||||
entryPoints:
|
||||
- web
|
||||
priority: 100
|
||||
|
||||
nomad-redirect:
|
||||
rule: "Path(`/nomad`) || Path(`/nomad/`)"
|
||||
service: nomad-cluster
|
||||
middlewares:
|
||||
- nomad-redirect
|
||||
entryPoints:
|
||||
- web
|
||||
priority: 100
|
||||
|
||||
nomad-ui:
|
||||
rule: "PathPrefix(`/nomad/ui`)"
|
||||
service: nomad-cluster
|
||||
middlewares:
|
||||
- nomad-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
priority: 5
|
||||
|
||||
nomad-api:
|
||||
rule: "PathPrefix(`/nomad/v1`)"
|
||||
service: nomad-cluster
|
||||
middlewares:
|
||||
- nomad-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
priority: 5
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,150 @@
|
|||
job "traefik-consul-lb" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
update {
|
||||
min_healthy_time = "60s"
|
||||
healthy_deadline = "5m"
|
||||
progress_deadline = "10m"
|
||||
auto_revert = false
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "hcp1.tailnet-68f9.ts.net:80"
|
||||
traefik:
|
||||
address: "100.97.62.111:8080"
|
||||
|
||||
providers:
|
||||
file:
|
||||
filename: /local/dynamic.yml
|
||||
watch: true
|
||||
|
||||
metrics:
|
||||
prometheus:
|
||||
addEntryPointsLabels: true
|
||||
addServicesLabels: true
|
||||
addRoutersLabels: true
|
||||
|
||||
log:
|
||||
level: INFO
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
http:
|
||||
middlewares:
|
||||
consul-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/consul"
|
||||
|
||||
traefik-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/traefik"
|
||||
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-api:
|
||||
rule: "PathPrefix(`/consul`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
|
||||
traefik-dashboard:
|
||||
rule: "PathPrefix(`/traefik`)"
|
||||
service: dashboard@internal
|
||||
middlewares:
|
||||
- traefik-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
|
||||
service {
|
||||
name = "consul-lb"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
name = "consul-lb-health"
|
||||
type = "http"
|
||||
path = "/consul/v1/status/leader"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "traefik-dashboard"
|
||||
port = "traefik"
|
||||
|
||||
check {
|
||||
name = "traefik-dashboard-health"
|
||||
type = "http"
|
||||
path = "/api/rawdata"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,40 @@
|
|||
job "traefik-no-service" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--api.dashboard=true",
|
||||
"--api.insecure=true",
|
||||
"--providers.file.directory=/tmp",
|
||||
"--entrypoints.web.address=:80"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 200
|
||||
memory = 128
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,68 @@
|
|||
job "traefik-simple" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "0.0.0.0:80"
|
||||
traefik:
|
||||
address: "0.0.0.0:8080"
|
||||
|
||||
providers:
|
||||
consulCatalog:
|
||||
endpoint:
|
||||
address: "warden.tailnet-68f9.ts.net:8500"
|
||||
scheme: "http"
|
||||
watch: true
|
||||
exposedByDefault: false
|
||||
prefix: "traefik"
|
||||
|
||||
log:
|
||||
level: INFO
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,150 @@
|
|||
job "traefik-consul-lb" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
update {
|
||||
min_healthy_time = "60s"
|
||||
healthy_deadline = "5m"
|
||||
progress_deadline = "10m"
|
||||
auto_revert = false
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "100.97.62.111:80"
|
||||
traefik:
|
||||
address: "100.97.62.111:8080"
|
||||
|
||||
providers:
|
||||
file:
|
||||
filename: /local/dynamic.yml
|
||||
watch: true
|
||||
|
||||
metrics:
|
||||
prometheus:
|
||||
addEntryPointsLabels: true
|
||||
addServicesLabels: true
|
||||
addRoutersLabels: true
|
||||
|
||||
log:
|
||||
level: INFO
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
http:
|
||||
middlewares:
|
||||
consul-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/consul"
|
||||
|
||||
traefik-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/traefik"
|
||||
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-api:
|
||||
rule: "PathPrefix(`/consul`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
|
||||
traefik-dashboard:
|
||||
rule: "PathPrefix(`/traefik`)"
|
||||
service: dashboard@internal
|
||||
middlewares:
|
||||
- traefik-stripprefix
|
||||
entryPoints:
|
||||
- web
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
|
||||
service {
|
||||
name = "consul-lb"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
name = "consul-lb-health"
|
||||
type = "http"
|
||||
path = "/consul/v1/status/leader"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "traefik-dashboard"
|
||||
port = "traefik"
|
||||
|
||||
check {
|
||||
name = "traefik-dashboard-health"
|
||||
type = "http"
|
||||
path = "/api/rawdata"
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# Vault 配置
|
||||
|
||||
## Jobs
|
||||
|
||||
- `vault-cluster-exec.nomad` - Vault 集群 (exec 驱动)
|
||||
- `vault-cluster-podman.nomad` - Vault 集群 (podman 驱动)
|
||||
- `vault-dev-warden.nomad` - Vault 开发环境
|
||||
|
|
@ -0,0 +1,283 @@
|
|||
job "vault-cluster-exec" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "vault-ch4" {
|
||||
count = 1
|
||||
|
||||
# 使用存在的属性替代consul版本检查
|
||||
constraint {
|
||||
attribute = "${driver.exec}"
|
||||
operator = "="
|
||||
value = "1"
|
||||
}
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "ch4"
|
||||
}
|
||||
|
||||
network {
|
||||
port "api" {
|
||||
static = 8200
|
||||
}
|
||||
port "cluster" {
|
||||
static = 8201
|
||||
}
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/opt/nomad/data/vault/config/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOH
|
||||
storage "consul" {
|
||||
address = "{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
|
||||
path = "vault/"
|
||||
# Consul服务发现配置
|
||||
service {
|
||||
name = "vault"
|
||||
tags = ["vault"]
|
||||
}
|
||||
}
|
||||
|
||||
listener "tcp" {
|
||||
address = "0.0.0.0:8200"
|
||||
tls_disable = 1 # 生产环境应启用TLS
|
||||
}
|
||||
|
||||
api_addr = "http://{{ env "NOMAD_IP_api" }}:8200"
|
||||
cluster_addr = "http://{{ env "NOMAD_IP_cluster" }}:8201"
|
||||
|
||||
ui = true
|
||||
disable_mlock = true
|
||||
|
||||
# 添加更多配置来解决权限问题
|
||||
disable_sealwrap = true
|
||||
disable_cache = false
|
||||
|
||||
# 启用原始日志记录
|
||||
enable_raw_log = true
|
||||
|
||||
# 集成Nomad服务发现
|
||||
service_registration {
|
||||
enabled = true
|
||||
}
|
||||
EOH
|
||||
destination = "/opt/nomad/data/vault/config/vault.hcl"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 256
|
||||
}
|
||||
|
||||
service {
|
||||
name = "vault"
|
||||
port = "api"
|
||||
|
||||
check {
|
||||
name = "vault-health"
|
||||
type = "http"
|
||||
path = "/v1/sys/health"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "vault-ash3c" {
|
||||
count = 1
|
||||
|
||||
# 移除对consul版本的约束,使用driver约束替代
|
||||
constraint {
|
||||
attribute = "${driver.exec}"
|
||||
operator = "="
|
||||
value = "1"
|
||||
}
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "us-ash3c"
|
||||
}
|
||||
|
||||
network {
|
||||
port "api" {
|
||||
static = 8200
|
||||
}
|
||||
port "cluster" {
|
||||
static = 8201
|
||||
}
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/opt/nomad/data/vault/config/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOH
|
||||
storage "consul" {
|
||||
address = "{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
|
||||
path = "vault/"
|
||||
# Consul服务发现配置
|
||||
service {
|
||||
name = "vault"
|
||||
tags = ["vault"]
|
||||
}
|
||||
}
|
||||
|
||||
listener "tcp" {
|
||||
address = "0.0.0.0:8200"
|
||||
tls_disable = 1 # 生产环境应启用TLS
|
||||
}
|
||||
|
||||
api_addr = "http://{{ env "NOMAD_IP_api" }}:8200"
|
||||
cluster_addr = "http://{{ env "NOMAD_IP_cluster" }}:8201"
|
||||
|
||||
ui = true
|
||||
disable_mlock = true
|
||||
|
||||
# 添加更多配置来解决权限问题
|
||||
disable_sealwrap = true
|
||||
disable_cache = false
|
||||
|
||||
# 启用原始日志记录
|
||||
enable_raw_log = true
|
||||
|
||||
# 集成Nomad服务发现
|
||||
service_registration {
|
||||
enabled = true
|
||||
}
|
||||
EOH
|
||||
destination = "/opt/nomad/data/vault/config/vault.hcl"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 256
|
||||
}
|
||||
|
||||
service {
|
||||
name = "vault"
|
||||
port = "api"
|
||||
|
||||
check {
|
||||
name = "vault-health"
|
||||
type = "http"
|
||||
path = "/v1/sys/health"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
group "vault-warden" {
|
||||
count = 1
|
||||
|
||||
# 移除对consul版本的约束,使用driver约束替代
|
||||
constraint {
|
||||
attribute = "${driver.exec}"
|
||||
operator = "="
|
||||
value = "1"
|
||||
}
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "bj-warden"
|
||||
}
|
||||
|
||||
network {
|
||||
port "api" {
|
||||
static = 8200
|
||||
}
|
||||
port "cluster" {
|
||||
static = 8201
|
||||
}
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/opt/nomad/data/vault/config/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOH
|
||||
storage "consul" {
|
||||
address = "{{ with nomadService "consul" }}{{ range . }}{{ if contains .Tags "http" }}{{ .Address }}:{{ .Port }}{{ end }}{{ end }}{{ end }}"
|
||||
path = "vault/"
|
||||
# Consul服务发现配置
|
||||
service {
|
||||
name = "vault"
|
||||
tags = ["vault"]
|
||||
}
|
||||
}
|
||||
|
||||
listener "tcp" {
|
||||
address = "0.0.0.0:8200"
|
||||
tls_disable = 1 # 生产环境应启用TLS
|
||||
}
|
||||
|
||||
api_addr = "http://{{ env "NOMAD_IP_api" }}:8200"
|
||||
cluster_addr = "http://{{ env "NOMAD_IP_cluster" }}:8201"
|
||||
|
||||
ui = true
|
||||
disable_mlock = true
|
||||
|
||||
# 添加更多配置来解决权限问题
|
||||
disable_sealwrap = true
|
||||
disable_cache = false
|
||||
|
||||
# 启用原始日志记录
|
||||
enable_raw_log = true
|
||||
|
||||
# 集成Nomad服务发现
|
||||
service_registration {
|
||||
enabled = true
|
||||
}
|
||||
EOH
|
||||
destination = "/opt/nomad/data/vault/config/vault.hcl"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 256
|
||||
}
|
||||
|
||||
service {
|
||||
name = "vault"
|
||||
port = "api"
|
||||
|
||||
check {
|
||||
name = "vault-health"
|
||||
type = "http"
|
||||
path = "/v1/sys/health"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,94 @@
|
|||
job "vault-cluster" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "vault-servers" {
|
||||
count = 3
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "(warden|ash3c|master)"
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "hashicorp/vault:latest"
|
||||
ports = ["api", "cluster"]
|
||||
|
||||
# 确保容器在退出时不会自动重启
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/vault/config/vault.hcl"
|
||||
]
|
||||
|
||||
# 容器网络设置
|
||||
network_mode = "host"
|
||||
|
||||
# 安全设置
|
||||
cap_add = ["IPC_LOCK"]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOH
|
||||
storage "consul" {
|
||||
address = "localhost:8500"
|
||||
path = "vault/"
|
||||
token = "{{ with secret "consul/creds/vault" }}{{ .Data.token }}{{ end }}"
|
||||
}
|
||||
|
||||
listener "tcp" {
|
||||
address = "0.0.0.0:8200"
|
||||
tls_disable = 1 # 生产环境应启用TLS
|
||||
}
|
||||
|
||||
api_addr = "http://{{ env "NOMAD_IP_api" }}:8200"
|
||||
cluster_addr = "http://{{ env "NOMAD_IP_cluster" }}:8201"
|
||||
|
||||
ui = true
|
||||
disable_mlock = true
|
||||
EOH
|
||||
destination = "vault/config/vault.hcl"
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
volume = "vault-data"
|
||||
destination = "/vault/data"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
|
||||
network {
|
||||
mbits = 10
|
||||
port "api" { static = 8200 }
|
||||
port "cluster" { static = 8201 }
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "vault"
|
||||
port = "api"
|
||||
|
||||
check {
|
||||
name = "vault-health"
|
||||
type = "http"
|
||||
path = "/v1/sys/health"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
volume "vault-data" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "vault-data"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,65 @@
|
|||
job "vault-dev-warden" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "vault-dev" {
|
||||
count = 1
|
||||
|
||||
# 约束到有consul的节点
|
||||
constraint {
|
||||
attribute = "${meta.consul}"
|
||||
operator = "="
|
||||
value = "true"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
to = 8200
|
||||
}
|
||||
port "cluster" {
|
||||
to = 8201
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "vault-dev"
|
||||
port = "http"
|
||||
|
||||
check {
|
||||
type = "http"
|
||||
path = "/v1/sys/health"
|
||||
interval = "10s"
|
||||
timeout = "5s"
|
||||
}
|
||||
}
|
||||
|
||||
task "vault-dev" {
|
||||
driver = "raw_exec"
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-dev",
|
||||
"-dev-listen-address=0.0.0.0:8200",
|
||||
"-dev-root-token-id=root"
|
||||
]
|
||||
}
|
||||
|
||||
env {
|
||||
VAULT_ADDR = "http://127.0.0.1:8200"
|
||||
VAULT_TOKEN = "root"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
|
||||
logs {
|
||||
max_files = 10
|
||||
max_file_size = 10
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,241 @@
|
|||
job "vault-cluster-nomad" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "vault-ch4" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "ch4"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8200
|
||||
to = 8200
|
||||
}
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "exec"
|
||||
|
||||
consul {
|
||||
namespace = "default"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
env {
|
||||
VAULT_ADDR = "http://127.0.0.1:8200"
|
||||
}
|
||||
|
||||
# 从 consul 读取配置
|
||||
template {
|
||||
data = <<EOF
|
||||
{{ key "vault/config" }}
|
||||
EOF
|
||||
destination = "local/vault.hcl"
|
||||
perms = "644"
|
||||
wait {
|
||||
min = "2s"
|
||||
max = "10s"
|
||||
}
|
||||
}
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/local/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
restart {
|
||||
attempts = 2
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
}
|
||||
|
||||
update {
|
||||
max_parallel = 3
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
progress_deadline = "10m"
|
||||
auto_revert = true
|
||||
canary = 0
|
||||
}
|
||||
|
||||
migrate {
|
||||
max_parallel = 1
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
}
|
||||
|
||||
group "vault-ash3c" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "ash3c"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8200
|
||||
to = 8200
|
||||
}
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "exec"
|
||||
|
||||
consul {
|
||||
namespace = "default"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
env {
|
||||
VAULT_ADDR = "http://127.0.0.1:8200"
|
||||
}
|
||||
|
||||
# 从 consul 读取配置
|
||||
template {
|
||||
data = <<EOF
|
||||
{{ key "vault/config" }}
|
||||
EOF
|
||||
destination = "local/vault.hcl"
|
||||
perms = "644"
|
||||
wait {
|
||||
min = "2s"
|
||||
max = "10s"
|
||||
}
|
||||
}
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/local/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
restart {
|
||||
attempts = 2
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
}
|
||||
|
||||
update {
|
||||
max_parallel = 3
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
progress_deadline = "10m"
|
||||
auto_revert = true
|
||||
canary = 0
|
||||
}
|
||||
|
||||
migrate {
|
||||
max_parallel = 1
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
}
|
||||
|
||||
group "vault-warden" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8200
|
||||
to = 8200
|
||||
}
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "exec"
|
||||
|
||||
consul {
|
||||
namespace = "default"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
env {
|
||||
VAULT_ADDR = "http://127.0.0.1:8200"
|
||||
}
|
||||
|
||||
# 从 consul 读取配置
|
||||
template {
|
||||
data = <<EOF
|
||||
{{ key "vault/config" }}
|
||||
EOF
|
||||
destination = "local/vault.hcl"
|
||||
perms = "644"
|
||||
wait {
|
||||
min = "2s"
|
||||
max = "10s"
|
||||
}
|
||||
}
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/local/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
restart {
|
||||
attempts = 2
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
}
|
||||
|
||||
update {
|
||||
max_parallel = 3
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
progress_deadline = "10m"
|
||||
auto_revert = true
|
||||
canary = 0
|
||||
}
|
||||
|
||||
migrate {
|
||||
max_parallel = 1
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,157 @@
|
|||
job "vault" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
# 约束只在 warden、ch4、ash3c 节点上运行
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "regexp"
|
||||
value = "^(warden|ch4|ash3c)$"
|
||||
}
|
||||
|
||||
group "vault" {
|
||||
count = 3
|
||||
|
||||
# 确保每个节点只运行一个实例
|
||||
constraint {
|
||||
operator = "distinct_hosts"
|
||||
value = "true"
|
||||
}
|
||||
|
||||
# 网络配置
|
||||
network {
|
||||
port "http" {
|
||||
static = 8200
|
||||
to = 8200
|
||||
}
|
||||
}
|
||||
|
||||
# 服务发现配置 - 包含版本信息
|
||||
service {
|
||||
name = "vault"
|
||||
port = "http"
|
||||
|
||||
# 添加版本标签以避免检查拒绝
|
||||
tags = [
|
||||
"vault",
|
||||
"secrets",
|
||||
"version:1.20.3"
|
||||
]
|
||||
|
||||
check {
|
||||
name = "vault-health"
|
||||
type = "http"
|
||||
path = "/v1/sys/health"
|
||||
interval = "10s"
|
||||
timeout = "3s"
|
||||
method = "GET"
|
||||
|
||||
}
|
||||
|
||||
# 健康检查配置
|
||||
check {
|
||||
name = "vault-sealed-check"
|
||||
type = "script"
|
||||
command = "/bin/sh"
|
||||
args = ["-c", "vault status -format=json | jq -r '.sealed' | grep -q 'false'"]
|
||||
interval = "30s"
|
||||
timeout = "5s"
|
||||
task = "vault"
|
||||
}
|
||||
}
|
||||
|
||||
# 任务配置
|
||||
task "vault" {
|
||||
driver = "raw_exec"
|
||||
|
||||
# 资源配置
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
# 环境变量
|
||||
env {
|
||||
VAULT_ADDR = "http://127.0.0.1:8200"
|
||||
}
|
||||
|
||||
# 模板配置 - Vault 配置文件
|
||||
template {
|
||||
data = <<EOF
|
||||
ui = true
|
||||
|
||||
storage "consul" {
|
||||
address = "127.0.0.1:8500"
|
||||
path = "vault"
|
||||
}
|
||||
|
||||
# HTTP listener (不使用 TLS,因为 nomad 会处理负载均衡)
|
||||
listener "tcp" {
|
||||
address = "0.0.0.0:8200"
|
||||
tls_disable = 1
|
||||
}
|
||||
|
||||
# 禁用 mlock 以避免权限问题
|
||||
disable_mlock = true
|
||||
|
||||
# 日志配置
|
||||
log_level = "INFO"
|
||||
log_format = "json"
|
||||
|
||||
# 性能优化
|
||||
max_lease_ttl = "168h"
|
||||
default_lease_ttl = "24h"
|
||||
|
||||
# HA 配置
|
||||
ha_storage "consul" {
|
||||
address = "127.0.0.1:8500"
|
||||
path = "vault"
|
||||
}
|
||||
EOF
|
||||
destination = "local/vault.hcl"
|
||||
perms = "644"
|
||||
wait {
|
||||
min = "2s"
|
||||
max = "10s"
|
||||
}
|
||||
}
|
||||
|
||||
# 启动命令
|
||||
config {
|
||||
command = "/usr/bin/vault"
|
||||
args = [
|
||||
"agent",
|
||||
"-config=/local/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
|
||||
# 重启策略
|
||||
restart {
|
||||
attempts = 3
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
}
|
||||
|
||||
# 更新策略
|
||||
update {
|
||||
max_parallel = 1
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
progress_deadline = "10m"
|
||||
auto_revert = true
|
||||
canary = 0
|
||||
}
|
||||
|
||||
# 迁移策略
|
||||
migrate {
|
||||
max_parallel = 1
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,213 @@
|
|||
job "traefik-cloudflare-v1" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
port "https" {
|
||||
static = 443
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
host_network = "tailscale0"
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "0.0.0.0:80"
|
||||
http:
|
||||
redirections:
|
||||
entrypoint:
|
||||
to: websecure
|
||||
scheme: https
|
||||
permanent: true
|
||||
websecure:
|
||||
address: "0.0.0.0:443"
|
||||
traefik:
|
||||
address: "0.0.0.0:8080"
|
||||
|
||||
providers:
|
||||
consulCatalog:
|
||||
endpoint:
|
||||
address: "warden.tailnet-68f9.ts.net:8500"
|
||||
scheme: "http"
|
||||
watch: true
|
||||
exposedByDefault: false
|
||||
prefix: "traefik"
|
||||
defaultRule: "Host(`{{ .Name }}.git4ta.me`)"
|
||||
file:
|
||||
filename: /local/dynamic.yml
|
||||
watch: true
|
||||
|
||||
certificatesResolvers:
|
||||
cloudflare:
|
||||
acme:
|
||||
email: houzhongxu.houzhongxu@gmail.com
|
||||
storage: /local/acme.json
|
||||
dnsChallenge:
|
||||
provider: cloudflare
|
||||
delayBeforeCheck: 30s
|
||||
resolvers:
|
||||
- "1.1.1.1:53"
|
||||
- "1.0.0.1:53"
|
||||
|
||||
log:
|
||||
level: DEBUG
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
http:
|
||||
serversTransports:
|
||||
waypoint-insecure:
|
||||
insecureSkipVerify: true
|
||||
|
||||
middlewares:
|
||||
consul-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/consul"
|
||||
waypoint-auth:
|
||||
replacePathRegex:
|
||||
regex: "^/auth/token(.*)$"
|
||||
replacement: "/auth/token$1"
|
||||
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,优先
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,备用
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,备用
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
nomad-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京,优先
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:4646" # 韩国,备用
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国,备用
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
waypoint-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
|
||||
serversTransport: waypoint-insecure
|
||||
|
||||
vault-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8200" # 韩国,活跃节点
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8200" # 美国,备用节点
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,备用节点
|
||||
healthCheck:
|
||||
path: "/v1/sys/health"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-api:
|
||||
rule: "Host(`consul.git4ta.me`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
traefik-dashboard:
|
||||
rule: "Host(`traefik.git4ta.me`)"
|
||||
service: dashboard@internal
|
||||
middlewares:
|
||||
- dashboard_redirect@internal
|
||||
- dashboard_stripprefix@internal
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
nomad-ui:
|
||||
rule: "Host(`nomad.git4ta.me`)"
|
||||
service: nomad-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
waypoint-ui:
|
||||
rule: "Host(`waypoint.git4ta.me`)"
|
||||
service: waypoint-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
vault-ui:
|
||||
rule: "Host(`vault.git4ta.me`)"
|
||||
service: vault-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
CLOUDFLARE_EMAIL=houzhongxu.houzhongxu@gmail.com
|
||||
CLOUDFLARE_DNS_API_TOKEN=HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3
|
||||
CLOUDFLARE_ZONE_API_TOKEN=HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3
|
||||
EOF
|
||||
destination = "local/cloudflare.env"
|
||||
env = true
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,241 @@
|
|||
job "vault-cluster-nomad" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "vault-ch4" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "ch4"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8200
|
||||
to = 8200
|
||||
}
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "exec"
|
||||
|
||||
consul {
|
||||
namespace = "default"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
env {
|
||||
VAULT_ADDR = "http://127.0.0.1:8200"
|
||||
}
|
||||
|
||||
# 从 consul 读取配置
|
||||
template {
|
||||
data = <<EOF
|
||||
{{ key "vault/config" }}
|
||||
EOF
|
||||
destination = "local/vault.hcl"
|
||||
perms = "644"
|
||||
wait {
|
||||
min = "2s"
|
||||
max = "10s"
|
||||
}
|
||||
}
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/local/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
restart {
|
||||
attempts = 2
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
}
|
||||
|
||||
update {
|
||||
max_parallel = 3
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
progress_deadline = "10m"
|
||||
auto_revert = true
|
||||
canary = 0
|
||||
}
|
||||
|
||||
migrate {
|
||||
max_parallel = 1
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
}
|
||||
|
||||
group "vault-ash3c" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "ash3c"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8200
|
||||
to = 8200
|
||||
}
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "exec"
|
||||
|
||||
consul {
|
||||
namespace = "default"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
env {
|
||||
VAULT_ADDR = "http://127.0.0.1:8200"
|
||||
}
|
||||
|
||||
# 从 consul 读取配置
|
||||
template {
|
||||
data = <<EOF
|
||||
{{ key "vault/config" }}
|
||||
EOF
|
||||
destination = "local/vault.hcl"
|
||||
perms = "644"
|
||||
wait {
|
||||
min = "2s"
|
||||
max = "10s"
|
||||
}
|
||||
}
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/local/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
restart {
|
||||
attempts = 2
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
}
|
||||
|
||||
update {
|
||||
max_parallel = 3
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
progress_deadline = "10m"
|
||||
auto_revert = true
|
||||
canary = 0
|
||||
}
|
||||
|
||||
migrate {
|
||||
max_parallel = 1
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
}
|
||||
|
||||
group "vault-warden" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8200
|
||||
to = 8200
|
||||
}
|
||||
}
|
||||
|
||||
task "vault" {
|
||||
driver = "exec"
|
||||
|
||||
consul {
|
||||
namespace = "default"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 1024
|
||||
}
|
||||
|
||||
env {
|
||||
VAULT_ADDR = "http://127.0.0.1:8200"
|
||||
}
|
||||
|
||||
# 从 consul 读取配置
|
||||
template {
|
||||
data = <<EOF
|
||||
{{ key "vault/config" }}
|
||||
EOF
|
||||
destination = "local/vault.hcl"
|
||||
perms = "644"
|
||||
wait {
|
||||
min = "2s"
|
||||
max = "10s"
|
||||
}
|
||||
}
|
||||
|
||||
config {
|
||||
command = "vault"
|
||||
args = [
|
||||
"server",
|
||||
"-config=/local/vault.hcl"
|
||||
]
|
||||
}
|
||||
|
||||
restart {
|
||||
attempts = 2
|
||||
interval = "30m"
|
||||
delay = "15s"
|
||||
mode = "fail"
|
||||
}
|
||||
}
|
||||
|
||||
update {
|
||||
max_parallel = 3
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
progress_deadline = "10m"
|
||||
auto_revert = true
|
||||
canary = 0
|
||||
}
|
||||
|
||||
migrate {
|
||||
max_parallel = 1
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,49 @@
|
|||
job "waypoint-server" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "waypoint" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 9701
|
||||
}
|
||||
port "grpc" {
|
||||
static = 9702
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
task "waypoint" {
|
||||
driver = "raw_exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/waypoint"
|
||||
|
||||
args = [
|
||||
"server", "run",
|
||||
"-accept-tos",
|
||||
"-vvv",
|
||||
"-db=/opt/waypoint/waypoint.db",
|
||||
"-listen-grpc=0.0.0.0:9702",
|
||||
"-listen-http=0.0.0.0:9701"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
|
||||
env {
|
||||
WAYPOINT_LOG_LEVEL = "DEBUG"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,19 @@
|
|||
# Consul 配置
|
||||
|
||||
## 部署
|
||||
|
||||
```bash
|
||||
nomad job run components/consul/jobs/consul-cluster.nomad
|
||||
```
|
||||
|
||||
## Job 信息
|
||||
|
||||
- **Job 名称**: `consul-cluster-nomad`
|
||||
- **类型**: service
|
||||
- **节点**: master, ash3c, warden
|
||||
|
||||
## 访问方式
|
||||
|
||||
- Master: `http://master.tailnet-68f9.ts.net:8500`
|
||||
- Ash3c: `http://ash3c.tailnet-68f9.ts.net:8500`
|
||||
- Warden: `http://warden.tailnet-68f9.ts.net:8500`
|
||||
|
|
@ -0,0 +1,88 @@
|
|||
# Consul配置文件
|
||||
# 此文件包含Consul的完整配置,包括变量和存储相关设置
|
||||
|
||||
# 基础配置
|
||||
data_dir = "/opt/consul/data"
|
||||
raft_dir = "/opt/consul/raft"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = true
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "dc1"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = 3
|
||||
|
||||
# 网络配置
|
||||
client_addr = "0.0.0.0"
|
||||
bind_addr = "{{ GetInterfaceIP `eth0` }}"
|
||||
advertise_addr = "{{ GetInterfaceIP `eth0` }}"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = 8600
|
||||
http = 8500
|
||||
https = -1
|
||||
grpc = 8502
|
||||
grpc_tls = 8503
|
||||
serf_lan = 8301
|
||||
serf_wan = 8302
|
||||
server = 8300
|
||||
}
|
||||
|
||||
# 集群连接
|
||||
retry_join = ["100.117.106.136", "100.116.80.94", "100.122.197.112"]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = true
|
||||
enable_script_checks = true
|
||||
enable_local_script_checks = true
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = 1
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "INFO"
|
||||
enable_syslog = false
|
||||
log_file = "/var/log/consul/consul.log"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "YourEncryptionKeyHere"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "30s"
|
||||
reconnect_timeout_wan = "30s"
|
||||
session_ttl_min = "10s"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = true
|
||||
last_contact_threshold = "200ms"
|
||||
max_trailing_logs = 250
|
||||
server_stabilization_time = "10s"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = false
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = true
|
||||
interval = "24h"
|
||||
retain = 30
|
||||
name = "consul-snapshot-{{.Timestamp}}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = true
|
||||
interval = "6h"
|
||||
retain = 7
|
||||
name = "consul-backup-{{.Timestamp}}"
|
||||
}
|
||||
|
|
@ -0,0 +1,93 @@
|
|||
# Consul配置模板文件
|
||||
# 此文件使用Consul模板语法从KV存储中动态获取配置
|
||||
# 遵循 config/{environment}/{provider}/{region_or_service}/{key} 格式
|
||||
|
||||
# 基础配置
|
||||
data_dir = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/data_dir` `/opt/consul/data` }}"
|
||||
raft_dir = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/raft_dir` `/opt/consul/raft` }}"
|
||||
|
||||
# 启用UI
|
||||
ui_config {
|
||||
enabled = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ui/enabled` `true` }}
|
||||
}
|
||||
|
||||
# 数据中心配置
|
||||
datacenter = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/datacenter` `dc1` }}"
|
||||
|
||||
# 服务器配置
|
||||
server = true
|
||||
bootstrap_expect = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/bootstrap_expect` `3` }}
|
||||
|
||||
# 网络配置
|
||||
client_addr = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/network/client_addr` `0.0.0.0` }}"
|
||||
bind_addr = "{{ GetInterfaceIP (keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/network/bind_interface` `ens160`) }}"
|
||||
advertise_addr = "{{ GetInterfaceIP (keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/network/advertise_interface` `ens160`) }}"
|
||||
|
||||
# 端口配置
|
||||
ports {
|
||||
dns = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/dns` `8600` }}
|
||||
http = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/http` `8500` }}
|
||||
https = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/https` `-1` }}
|
||||
grpc = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/grpc` `8502` }}
|
||||
grpc_tls = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/grpc_tls` `8503` }}
|
||||
serf_lan = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/serf_lan` `8301` }}
|
||||
serf_wan = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/serf_wan` `8302` }}
|
||||
server = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/ports/server` `8300` }}
|
||||
}
|
||||
|
||||
# 集群连接 - 动态获取节点IP
|
||||
retry_join = [
|
||||
"{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/nodes/master/ip` `100.117.106.136` }}",
|
||||
"{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/nodes/ash3c/ip` `100.116.80.94` }}",
|
||||
"{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/nodes/warden/ip` `100.122.197.112` }}"
|
||||
]
|
||||
|
||||
# 服务发现
|
||||
enable_service_script = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/service/enable_service_script` `true` }}
|
||||
enable_script_checks = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/service/enable_script_checks` `true` }}
|
||||
enable_local_script_checks = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/service/enable_local_script_checks` `true` }}
|
||||
|
||||
# 性能调优
|
||||
performance {
|
||||
raft_multiplier = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/performance/raft_multiplier` `1` }}
|
||||
}
|
||||
|
||||
# 日志配置
|
||||
log_level = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/log_level` `INFO` }}"
|
||||
enable_syslog = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/log/enable_syslog` `false` }}
|
||||
log_file = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/log/log_file` `/var/log/consul/consul.log` }}"
|
||||
|
||||
# 安全配置
|
||||
encrypt = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/cluster/encrypt_key` `YourEncryptionKeyHere` }}"
|
||||
|
||||
# 连接配置
|
||||
reconnect_timeout = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/connection/reconnect_timeout` `30s` }}"
|
||||
reconnect_timeout_wan = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/connection/reconnect_timeout_wan` `30s` }}"
|
||||
session_ttl_min = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/connection/session_ttl_min` `10s` }}"
|
||||
|
||||
# Autopilot配置
|
||||
autopilot {
|
||||
cleanup_dead_servers = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/cleanup_dead_servers` `true` }}
|
||||
last_contact_threshold = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/last_contact_threshold` `200ms` }}"
|
||||
max_trailing_logs = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/max_trailing_logs` `250` }}
|
||||
server_stabilization_time = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/server_stabilization_time` `10s` }}"
|
||||
redundancy_zone_tag = ""
|
||||
disable_upgrade_migration = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/autopilot/disable_upgrade_migration` `false` }}
|
||||
upgrade_version_tag = ""
|
||||
}
|
||||
|
||||
# 快照配置
|
||||
snapshot {
|
||||
enabled = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/interval` `24h` }}"
|
||||
retain = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/retain` `30` }}
|
||||
name = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/snapshot/name` `consul-snapshot-{{.Timestamp}}` }}"
|
||||
}
|
||||
|
||||
# 备份配置
|
||||
backup {
|
||||
enabled = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/enabled` `true` }}
|
||||
interval = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/interval` `6h` }}"
|
||||
retain = {{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/retain` `7` }}
|
||||
name = "{{ keyOrDefault `config/` + env "ENVIRONMENT" + `/consul/backup/name` `consul-backup-{{.Timestamp}}` }}"
|
||||
}
|
||||
|
|
@ -0,0 +1,158 @@
|
|||
job "consul-cluster-nomad" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "consul-ch4" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "ch4"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "server" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf-lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf-wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind={{ env \"NOMAD_IP_http\" }}",
|
||||
"-advertise={{ env \"NOMAD_IP_http\" }}",
|
||||
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=warden.tailnet-68f9.ts.net:8301",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-ash3c" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "ash3c"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "server" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf-lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf-wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind={{ env \"NOMAD_IP_http\" }}",
|
||||
"-advertise={{ env \"NOMAD_IP_http\" }}",
|
||||
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=warden.tailnet-68f9.ts.net:8301",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-warden" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "server" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf-lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf-wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind={{ env \"NOMAD_IP_http\" }}",
|
||||
"-advertise={{ env \"NOMAD_IP_http\" }}",
|
||||
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,158 @@
|
|||
job "consul-cluster-nomad" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "consul-ch4" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "ch4"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "server" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf-lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf-wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-bootstrap-expect=3",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind={{ env "NOMAD_IP_http" }}",
|
||||
"-advertise={{ env "NOMAD_IP_http" }}",
|
||||
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=warden.tailnet-68f9.ts.net:8301",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-ash3c" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "ash3c"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "server" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf-lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf-wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind={{ env "NOMAD_IP_http" }}",
|
||||
"-advertise={{ env "NOMAD_IP_http" }}",
|
||||
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=warden.tailnet-68f9.ts.net:8301",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
|
||||
group "consul-warden" {
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "warden"
|
||||
}
|
||||
|
||||
network {
|
||||
port "http" {
|
||||
static = 8500
|
||||
}
|
||||
port "server" {
|
||||
static = 8300
|
||||
}
|
||||
port "serf-lan" {
|
||||
static = 8301
|
||||
}
|
||||
port "serf-wan" {
|
||||
static = 8302
|
||||
}
|
||||
}
|
||||
|
||||
task "consul" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "consul"
|
||||
args = [
|
||||
"agent",
|
||||
"-server",
|
||||
"-data-dir=/opt/nomad/data/consul",
|
||||
"-client=0.0.0.0",
|
||||
"-bind={{ env "NOMAD_IP_http" }}",
|
||||
"-advertise={{ env "NOMAD_IP_http" }}",
|
||||
"-retry-join=ch4.tailnet-68f9.ts.net:8301",
|
||||
"-retry-join=ash3c.tailnet-68f9.ts.net:8301",
|
||||
"-ui",
|
||||
"-http-port=8500",
|
||||
"-server-port=8300",
|
||||
"-serf-lan-port=8301",
|
||||
"-serf-wan-port=8302"
|
||||
]
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 300
|
||||
memory = 512
|
||||
}
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,8 @@
|
|||
# Nomad 配置
|
||||
|
||||
## Jobs
|
||||
|
||||
- `install-podman-driver.nomad` - 安装 Podman 驱动
|
||||
- `nomad-consul-config.nomad` - Nomad-Consul 配置
|
||||
- `nomad-consul-setup.nomad` - Nomad-Consul 设置
|
||||
- `nomad-nfs-volume.nomad` - NFS 卷配置
|
||||
|
|
@ -0,0 +1,43 @@
|
|||
job "juicefs-controller" {
|
||||
datacenters = ["dc1"]
|
||||
type = "system"
|
||||
|
||||
group "controller" {
|
||||
task "plugin" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "juicedata/juicefs-csi-driver:v0.14.1"
|
||||
args = [
|
||||
"--endpoint=unix://csi/csi.sock",
|
||||
"--logtostderr",
|
||||
"--nodeid=${node.unique.id}",
|
||||
"--v=5",
|
||||
"--by-process=true"
|
||||
]
|
||||
privileged = true
|
||||
}
|
||||
|
||||
csi_plugin {
|
||||
id = "juicefs-nfs"
|
||||
type = "controller"
|
||||
mount_dir = "/csi"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 512
|
||||
}
|
||||
|
||||
env {
|
||||
POD_NAME = "csi-controller"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,38 @@
|
|||
job "juicefs-csi-controller" {
|
||||
datacenters = ["dc1"]
|
||||
type = "system"
|
||||
|
||||
group "controller" {
|
||||
task "juicefs-csi-driver" {
|
||||
driver = "podman"
|
||||
|
||||
config {
|
||||
image = "juicedata/juicefs-csi-driver:v0.14.1"
|
||||
args = [
|
||||
"--endpoint=unix://csi/csi.sock",
|
||||
"--logtostderr",
|
||||
"--nodeid=${node.unique.id}",
|
||||
"--v=5"
|
||||
]
|
||||
privileged = true
|
||||
}
|
||||
|
||||
env {
|
||||
POD_NAME = "juicefs-csi-controller"
|
||||
POD_NAMESPACE = "default"
|
||||
NODE_NAME = "${node.unique.id}"
|
||||
}
|
||||
|
||||
csi_plugin {
|
||||
id = "juicefs0"
|
||||
type = "controller"
|
||||
mount_dir = "/csi"
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 100
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,43 @@
|
|||
# NFS CSI Volume Definition for Nomad
|
||||
# 这个文件定义了CSI volume,让NFS存储能在Nomad UI中显示
|
||||
|
||||
volume "nfs-shared-csi" {
|
||||
type = "csi"
|
||||
|
||||
# CSI plugin名称
|
||||
source = "csi-nfs"
|
||||
|
||||
# 容量设置
|
||||
capacity_min = "1GiB"
|
||||
capacity_max = "10TiB"
|
||||
|
||||
# 访问模式 - 支持多节点读写
|
||||
access_mode = "multi-node-multi-writer"
|
||||
|
||||
# 挂载选项
|
||||
mount_options {
|
||||
fs_type = "nfs4"
|
||||
mount_flags = "rw,relatime,vers=4.2"
|
||||
}
|
||||
|
||||
# 拓扑约束 - 确保在有NFS挂载的节点上运行
|
||||
topology_request {
|
||||
required {
|
||||
topology {
|
||||
"node" = "{{ range $node := nomadNodes }}{{ if eq $node.Status "ready" }}{{ $node.Name }}{{ end }}{{ end }}"
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
# 卷参数
|
||||
parameters {
|
||||
server = "snail"
|
||||
share = "/fs/1000/nfs/Fnsync"
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
# Dynamic Host Volume Definition for NFS
|
||||
# 这个文件定义了动态host volume,让NFS存储能在Nomad UI中显示
|
||||
|
||||
volume "nfs-shared-dynamic" {
|
||||
type = "host"
|
||||
|
||||
# 使用动态host volume
|
||||
source = "fnsync"
|
||||
|
||||
# 只读设置
|
||||
read_only = false
|
||||
|
||||
# 容量信息(用于显示)
|
||||
capacity_min = "1GiB"
|
||||
capacity_max = "10TiB"
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
# NFS Host Volume Definition for Nomad UI
|
||||
# 这个文件定义了host volume,让NFS存储能在Nomad UI中显示
|
||||
|
||||
volume "nfs-shared-host" {
|
||||
type = "host"
|
||||
|
||||
# 使用host volume
|
||||
source = "fnsync"
|
||||
|
||||
# 只读设置
|
||||
read_only = false
|
||||
|
||||
# 容量信息(用于显示)
|
||||
capacity_min = "1GiB"
|
||||
capacity_max = "10TiB"
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,28 @@
|
|||
# Traefik 配置
|
||||
|
||||
## 部署
|
||||
|
||||
```bash
|
||||
nomad job run components/traefik/jobs/traefik.nomad
|
||||
```
|
||||
|
||||
## 配置特点
|
||||
|
||||
- 明确绑定 Tailscale IP (100.97.62.111)
|
||||
- 地理位置优化的 Consul 集群顺序(北京 → 韩国 → 美国)
|
||||
- 适合跨太平洋网络的宽松健康检查
|
||||
- 无服务健康检查,避免 flapping
|
||||
|
||||
## 访问方式
|
||||
|
||||
- Dashboard: `http://hcp1.tailnet-68f9.ts.net:8080/dashboard/`
|
||||
- 直接 IP: `http://100.97.62.111:8080/dashboard/`
|
||||
- Consul LB: `http://hcp1.tailnet-68f9.ts.net:80`
|
||||
|
||||
## 故障排除
|
||||
|
||||
如果遇到服务 flapping 问题:
|
||||
1. 检查是否使用了 RFC1918 私有地址
|
||||
2. 确认 Tailscale 网络连通性
|
||||
3. 调整健康检查间隔时间
|
||||
4. 考虑地理位置对网络延迟的影响
|
||||
|
|
@ -0,0 +1,123 @@
|
|||
http:
|
||||
serversTransports:
|
||||
waypoint-insecure:
|
||||
insecureSkipVerify: true
|
||||
authentik-insecure:
|
||||
insecureSkipVerify: true
|
||||
|
||||
middlewares:
|
||||
consul-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/consul"
|
||||
waypoint-auth:
|
||||
replacePathRegex:
|
||||
regex: "^/auth/token(.*)$"
|
||||
replacement: "/auth/token$1"
|
||||
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,Leader
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,Follower
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,Follower
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
nomad-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch2.tailnet-68f9.ts.net:4646" # 韩国,Leader
|
||||
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京,Follower
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国,Follower
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
waypoint-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
|
||||
serversTransport: waypoint-insecure
|
||||
|
||||
vault-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,单节点
|
||||
healthCheck:
|
||||
path: "/ui/"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
authentik-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://authentik.tailnet-68f9.ts.net:9443" # Authentik容器HTTPS端口
|
||||
serversTransport: authentik-insecure
|
||||
healthCheck:
|
||||
path: "/flows/-/default/authentication/"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-api:
|
||||
rule: "Host(`consul.git4ta.tech`)"
|
||||
service: consul-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
|
||||
consul-ui:
|
||||
rule: "Host(`consul.git-4ta.live`) && PathPrefix(`/ui`)"
|
||||
service: consul-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
nomad-api:
|
||||
rule: "Host(`nomad.git-4ta.live`)"
|
||||
service: nomad-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
nomad-ui:
|
||||
rule: "Host(`nomad.git-4ta.live`) && PathPrefix(`/ui`)"
|
||||
service: nomad-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
waypoint-ui:
|
||||
rule: "Host(`waypoint.git-4ta.live`)"
|
||||
service: waypoint-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
vault-ui:
|
||||
rule: "Host(`vault.git-4ta.live`)"
|
||||
service: vault-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
authentik-ui:
|
||||
rule: "Host(`authentik1.git-4ta.live`)"
|
||||
service: authentik-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
|
@ -0,0 +1,254 @@
|
|||
job "traefik-cloudflare-v2" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
operator = "="
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
volume "traefik-certs" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "traefik-certs"
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
}
|
||||
port "https" {
|
||||
static = 443
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
env {
|
||||
CLOUDFLARE_EMAIL = "houzhongxu.houzhongxu@gmail.com"
|
||||
CLOUDFLARE_DNS_API_TOKEN = "HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3"
|
||||
CLOUDFLARE_ZONE_API_TOKEN = "HYT-cfZTP_jq6Xd9g3tpFMwxopOyIrf8LZpmGAI3"
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
volume = "traefik-certs"
|
||||
destination = "/opt/traefik/certs"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
debug: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "0.0.0.0:80"
|
||||
http:
|
||||
redirections:
|
||||
entrypoint:
|
||||
to: websecure
|
||||
scheme: https
|
||||
permanent: true
|
||||
websecure:
|
||||
address: "0.0.0.0:443"
|
||||
traefik:
|
||||
address: "0.0.0.0:8080"
|
||||
|
||||
providers:
|
||||
consulCatalog:
|
||||
endpoint:
|
||||
address: "warden.tailnet-68f9.ts.net:8500"
|
||||
scheme: "http"
|
||||
watch: true
|
||||
exposedByDefault: false
|
||||
prefix: "traefik"
|
||||
defaultRule: "Host(`{{ .Name }}.git-4ta.live`)"
|
||||
file:
|
||||
filename: /local/dynamic.yml
|
||||
watch: true
|
||||
|
||||
certificatesResolvers:
|
||||
cloudflare:
|
||||
acme:
|
||||
email: {{ env "CLOUDFLARE_EMAIL" }}
|
||||
storage: /opt/traefik/certs/acme.json
|
||||
dnsChallenge:
|
||||
provider: cloudflare
|
||||
delayBeforeCheck: 30s
|
||||
resolvers:
|
||||
- "1.1.1.1:53"
|
||||
- "1.0.0.1:53"
|
||||
|
||||
log:
|
||||
level: DEBUG
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
http:
|
||||
serversTransports:
|
||||
waypoint-insecure:
|
||||
insecureSkipVerify: true
|
||||
authentik-insecure:
|
||||
insecureSkipVerify: true
|
||||
|
||||
middlewares:
|
||||
consul-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/consul"
|
||||
waypoint-auth:
|
||||
replacePathRegex:
|
||||
regex: "^/auth/token(.*)$"
|
||||
replacement: "/auth/token$1"
|
||||
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,Leader
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,Follower
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,Follower
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
nomad-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch2.tailnet-68f9.ts.net:4646" # 韩国,Leader
|
||||
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京,Follower
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国,Follower
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
waypoint-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
|
||||
serversTransport: waypoint-insecure
|
||||
|
||||
vault-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,单节点
|
||||
healthCheck:
|
||||
path: "/ui/"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
authentik-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://authentik.tailnet-68f9.ts.net:9443" # Authentik容器HTTPS端口
|
||||
serversTransport: authentik-insecure
|
||||
healthCheck:
|
||||
path: "/flows/-/default/authentication/"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-api:
|
||||
rule: "Host(`consul.git-4ta.live`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
traefik-dashboard:
|
||||
rule: "Host(`traefik.git-4ta.live`)"
|
||||
service: dashboard@internal
|
||||
middlewares:
|
||||
- dashboard_redirect@internal
|
||||
- dashboard_stripprefix@internal
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
nomad-ui:
|
||||
rule: "Host(`nomad.git-4ta.live`)"
|
||||
service: nomad-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
waypoint-ui:
|
||||
rule: "Host(`waypoint.git-4ta.live`)"
|
||||
service: waypoint-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
vault-ui:
|
||||
rule: "Host(`vault.git-4ta.live`)"
|
||||
service: vault-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
authentik-ui:
|
||||
rule: "Host(`authentik.git-4ta.live`)"
|
||||
service: authentik-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
CLOUDFLARE_EMAIL={{ env "CLOUDFLARE_EMAIL" }}
|
||||
CLOUDFLARE_DNS_API_TOKEN={{ env "CLOUDFLARE_DNS_API_TOKEN" }}
|
||||
CLOUDFLARE_ZONE_API_TOKEN={{ env "CLOUDFLARE_ZONE_API_TOKEN" }}
|
||||
EOF
|
||||
destination = "local/cloudflare.env"
|
||||
env = true
|
||||
}
|
||||
|
||||
# 测试证书权限控制
|
||||
template {
|
||||
data = "-----BEGIN CERTIFICATE-----\nTEST CERTIFICATE FOR PERMISSION CONTROL\n-----END CERTIFICATE-----"
|
||||
destination = "/opt/traefik/certs/test-cert.pem"
|
||||
perms = 600
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,239 @@
|
|||
job "traefik-cloudflare-v2" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
volume "traefik-certs" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "traefik-certs"
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
}
|
||||
port "https" {
|
||||
static = 443
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
volume = "traefik-certs"
|
||||
destination = "/opt/traefik/certs"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "0.0.0.0:80"
|
||||
http:
|
||||
redirections:
|
||||
entrypoint:
|
||||
to: websecure
|
||||
scheme: https
|
||||
permanent: true
|
||||
websecure:
|
||||
address: "0.0.0.0:443"
|
||||
traefik:
|
||||
address: "0.0.0.0:8080"
|
||||
|
||||
providers:
|
||||
consulCatalog:
|
||||
endpoint:
|
||||
address: "warden.tailnet-68f9.ts.net:8500"
|
||||
scheme: "http"
|
||||
watch: true
|
||||
exposedByDefault: false
|
||||
prefix: "traefik"
|
||||
defaultRule: "Host(`{{ .Name }}.git-4ta.live`)"
|
||||
file:
|
||||
filename: /local/dynamic.yml
|
||||
watch: true
|
||||
|
||||
certificatesResolvers:
|
||||
cloudflare:
|
||||
acme:
|
||||
email: houzhongxu.houzhongxu@gmail.com
|
||||
storage: /opt/traefik/certs/acme.json
|
||||
dnsChallenge:
|
||||
provider: cloudflare
|
||||
delayBeforeCheck: 30s
|
||||
resolvers:
|
||||
- "1.1.1.1:53"
|
||||
- "1.0.0.1:53"
|
||||
|
||||
log:
|
||||
level: DEBUG
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
http:
|
||||
serversTransports:
|
||||
waypoint-insecure:
|
||||
insecureSkipVerify: true
|
||||
authentik-insecure:
|
||||
insecureSkipVerify: true
|
||||
|
||||
middlewares:
|
||||
consul-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/consul"
|
||||
waypoint-auth:
|
||||
replacePathRegex:
|
||||
regex: "^/auth/token(.*)$"
|
||||
replacement: "/auth/token$1"
|
||||
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,Leader
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,Follower
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,Follower
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
nomad-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch2.tailnet-68f9.ts.net:4646" # 韩国,Leader
|
||||
- url: "http://warden.tailnet-68f9.ts.net:4646" # 北京,Follower
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国,Follower
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
waypoint-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
|
||||
serversTransport: waypoint-insecure
|
||||
|
||||
vault-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,单节点
|
||||
healthCheck:
|
||||
path: "/ui/"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
authentik-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://authentik.tailnet-68f9.ts.net:9443" # Authentik容器HTTPS端口
|
||||
serversTransport: authentik-insecure
|
||||
healthCheck:
|
||||
path: "/flows/-/default/authentication/"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-api:
|
||||
rule: "Host(`consul.git-4ta.live`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
traefik-dashboard:
|
||||
rule: "Host(`traefik.git-4ta.live`)"
|
||||
service: dashboard@internal
|
||||
middlewares:
|
||||
- dashboard_redirect@internal
|
||||
- dashboard_stripprefix@internal
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
nomad-ui:
|
||||
rule: "Host(`nomad.git-4ta.live`)"
|
||||
service: nomad-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
waypoint-ui:
|
||||
rule: "Host(`waypoint.git-4ta.live`)"
|
||||
service: waypoint-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
vault-ui:
|
||||
rule: "Host(`vault.git-4ta.live`)"
|
||||
service: vault-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
authentik-ui:
|
||||
rule: "Host(`authentik.git4ta.tech`)"
|
||||
service: authentik-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
CLOUDFLARE_EMAIL=houzhongxu.houzhongxu@gmail.com
|
||||
CLOUDFLARE_DNS_API_TOKEN=0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr
|
||||
CLOUDFLARE_ZONE_API_TOKEN=0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr
|
||||
EOF
|
||||
destination = "local/cloudflare.env"
|
||||
env = true
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,249 @@
|
|||
job "traefik-cloudflare-v3" {
|
||||
datacenters = ["dc1"]
|
||||
type = "service"
|
||||
|
||||
group "traefik" {
|
||||
count = 1
|
||||
|
||||
constraint {
|
||||
attribute = "${node.unique.name}"
|
||||
value = "hcp1"
|
||||
}
|
||||
|
||||
volume "traefik-certs" {
|
||||
type = "host"
|
||||
read_only = false
|
||||
source = "traefik-certs"
|
||||
}
|
||||
|
||||
network {
|
||||
mode = "host"
|
||||
port "http" {
|
||||
static = 80
|
||||
}
|
||||
port "https" {
|
||||
static = 443
|
||||
}
|
||||
port "traefik" {
|
||||
static = 8080
|
||||
}
|
||||
}
|
||||
|
||||
task "traefik" {
|
||||
driver = "exec"
|
||||
|
||||
config {
|
||||
command = "/usr/local/bin/traefik"
|
||||
args = [
|
||||
"--configfile=/local/traefik.yml"
|
||||
]
|
||||
}
|
||||
|
||||
env {
|
||||
CLOUDFLARE_EMAIL = "locksmithknight@gmail.com"
|
||||
CLOUDFLARE_DNS_API_TOKEN = "0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr"
|
||||
CLOUDFLARE_ZONE_API_TOKEN = "0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr"
|
||||
}
|
||||
|
||||
volume_mount {
|
||||
volume = "traefik-certs"
|
||||
destination = "/opt/traefik/certs"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
api:
|
||||
dashboard: true
|
||||
insecure: true
|
||||
|
||||
entryPoints:
|
||||
web:
|
||||
address: "0.0.0.0:80"
|
||||
http:
|
||||
redirections:
|
||||
entrypoint:
|
||||
to: websecure
|
||||
scheme: https
|
||||
permanent: true
|
||||
websecure:
|
||||
address: "0.0.0.0:443"
|
||||
traefik:
|
||||
address: "0.0.0.0:8080"
|
||||
|
||||
providers:
|
||||
consulCatalog:
|
||||
endpoint:
|
||||
address: "warden.tailnet-68f9.ts.net:8500"
|
||||
scheme: "http"
|
||||
watch: true
|
||||
exposedByDefault: false
|
||||
prefix: "traefik"
|
||||
defaultRule: "Host(`{{ .Name }}.git-4ta.live`)"
|
||||
file:
|
||||
filename: /local/dynamic.yml
|
||||
watch: true
|
||||
|
||||
certificatesResolvers:
|
||||
cloudflare:
|
||||
acme:
|
||||
email: {{ env "CLOUDFLARE_EMAIL" }}
|
||||
storage: /opt/traefik/certs/acme.json
|
||||
dnsChallenge:
|
||||
provider: cloudflare
|
||||
delayBeforeCheck: 30s
|
||||
|
||||
log:
|
||||
level: DEBUG
|
||||
EOF
|
||||
destination = "local/traefik.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
http:
|
||||
serversTransports:
|
||||
waypoint-insecure:
|
||||
insecureSkipVerify: true
|
||||
authentik-insecure:
|
||||
insecureSkipVerify: true
|
||||
|
||||
middlewares:
|
||||
consul-stripprefix:
|
||||
stripPrefix:
|
||||
prefixes:
|
||||
- "/consul"
|
||||
waypoint-auth:
|
||||
replacePathRegex:
|
||||
regex: "^/auth/token(.*)$"
|
||||
replacement: "/auth/token$1"
|
||||
|
||||
services:
|
||||
consul-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch4.tailnet-68f9.ts.net:8500" # 韩国,Leader
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8500" # 北京,Follower
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:8500" # 美国,Follower
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
nomad-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://ch2.tailnet-68f9.ts.net:4646" # 韩国,Leader
|
||||
- url: "http://ash3c.tailnet-68f9.ts.net:4646" # 美国,Follower
|
||||
healthCheck:
|
||||
path: "/v1/status/leader"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
waypoint-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://hcp1.tailnet-68f9.ts.net:9701" # hcp1 节点 HTTPS API
|
||||
serversTransport: waypoint-insecure
|
||||
|
||||
vault-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "http://warden.tailnet-68f9.ts.net:8200" # 北京,单节点
|
||||
healthCheck:
|
||||
path: "/ui/"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
authentik-cluster:
|
||||
loadBalancer:
|
||||
servers:
|
||||
- url: "https://authentik.tailnet-68f9.ts.net:9443" # Authentik容器HTTPS端口
|
||||
serversTransport: authentik-insecure
|
||||
healthCheck:
|
||||
path: "/flows/-/default/authentication/"
|
||||
interval: "30s"
|
||||
timeout: "15s"
|
||||
|
||||
routers:
|
||||
consul-api:
|
||||
rule: "Host(`consul.git-4ta.live`)"
|
||||
service: consul-cluster
|
||||
middlewares:
|
||||
- consul-stripprefix
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
traefik-dashboard:
|
||||
rule: "Host(`traefik.git-4ta.live`)"
|
||||
service: dashboard@internal
|
||||
middlewares:
|
||||
- dashboard_redirect@internal
|
||||
- dashboard_stripprefix@internal
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
traefik-api:
|
||||
rule: "Host(`traefik.git-4ta.live`) && PathPrefix(`/api`)"
|
||||
service: api@internal
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
nomad-ui:
|
||||
rule: "Host(`nomad.git-4ta.live`)"
|
||||
service: nomad-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
waypoint-ui:
|
||||
rule: "Host(`waypoint.git-4ta.live`)"
|
||||
service: waypoint-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
vault-ui:
|
||||
rule: "Host(`vault.git-4ta.live`)"
|
||||
service: vault-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
|
||||
authentik-ui:
|
||||
rule: "Host(`authentik1.git-4ta.live`)"
|
||||
service: authentik-cluster
|
||||
entryPoints:
|
||||
- websecure
|
||||
tls:
|
||||
certResolver: cloudflare
|
||||
EOF
|
||||
destination = "local/dynamic.yml"
|
||||
}
|
||||
|
||||
template {
|
||||
data = <<EOF
|
||||
CLOUDFLARE_EMAIL=locksmithknight@gmail.com
|
||||
CLOUDFLARE_DNS_API_TOKEN=0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr
|
||||
CLOUDFLARE_ZONE_API_TOKEN=0aPWoLaQ59l0nyL1jIVzZaEx2e41Gjgcfhn3ztJr
|
||||
EOF
|
||||
destination = "local/cloudflare.env"
|
||||
env = true
|
||||
}
|
||||
|
||||
resources {
|
||||
cpu = 500
|
||||
memory = 512
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
# Vault 配置
|
||||
|
||||
## Jobs
|
||||
|
||||
- `vault-cluster-exec.nomad` - Vault 集群 (exec 驱动)
|
||||
- `vault-cluster-podman.nomad` - Vault 集群 (podman 驱动)
|
||||
- `vault-dev-warden.nomad` - Vault 开发环境
|
||||
|
|
@ -0,0 +1,104 @@
|
|||
# 项目管理 Makefile
|
||||
|
||||
.PHONY: help setup init plan apply destroy clean test lint docs
|
||||
|
||||
# 默认目标
|
||||
help: ## 显示帮助信息
|
||||
@echo "可用的命令:"
|
||||
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | sort | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[36m%-20s\033[0m %s\n", $$1, $$2}'
|
||||
|
||||
# 环境设置
|
||||
setup: ## 设置开发环境
|
||||
@echo "🚀 设置开发环境..."
|
||||
@bash scripts/setup/environment/setup-environment.sh
|
||||
|
||||
# OpenTofu 操作
|
||||
init: ## 初始化 OpenTofu
|
||||
@echo "🏗️ 初始化 OpenTofu..."
|
||||
@cd infrastructure/environments/dev && tofu init
|
||||
|
||||
plan: ## 生成执行计划
|
||||
@echo "📋 生成执行计划..."
|
||||
@cd infrastructure/environments/dev && tofu plan -var-file="terraform.tfvars"
|
||||
|
||||
apply: ## 应用基础设施变更
|
||||
@echo "🚀 应用基础设施变更..."
|
||||
@cd infrastructure/environments/dev && tofu apply -var-file="terraform.tfvars"
|
||||
|
||||
destroy: ## 销毁基础设施
|
||||
@echo "💥 销毁基础设施..."
|
||||
@cd infrastructure/environments/dev && tofu destroy -var-file="terraform.tfvars"
|
||||
|
||||
# Ansible 操作
|
||||
ansible-check: ## 检查 Ansible 配置
|
||||
@echo "🔍 检查 Ansible 配置..."
|
||||
@cd configuration && ansible-playbook --syntax-check playbooks/bootstrap/main.yml
|
||||
|
||||
ansible-deploy: ## 部署应用
|
||||
@echo "📦 部署应用..."
|
||||
@cd configuration && ansible-playbook -i inventories/production/inventory.ini playbooks/bootstrap/main.yml
|
||||
|
||||
# Podman 操作
|
||||
podman-build: ## 构建 Podman 镜像
|
||||
@echo "📦 构建 Podman 镜像..."
|
||||
@podman-compose -f containers/compose/development/docker-compose.yml build
|
||||
|
||||
podman-up: ## 启动开发环境
|
||||
@echo "🚀 启动开发环境..."
|
||||
@podman-compose -f containers/compose/development/docker-compose.yml up -d
|
||||
|
||||
podman-down: ## 停止开发环境
|
||||
@echo "🛑 停止开发环境..."
|
||||
@podman-compose -f containers/compose/development/docker-compose.yml down
|
||||
|
||||
# 测试
|
||||
test: ## 运行测试
|
||||
@echo "🧪 运行测试..."
|
||||
@bash scripts/testing/test-runner.sh
|
||||
|
||||
test-mcp: ## 运行MCP服务器测试
|
||||
@echo "🧪 运行MCP服务器测试..."
|
||||
@bash scripts/testing/mcp/test_local_mcp_servers.sh
|
||||
|
||||
test-kali: ## 运行Kali Linux快速健康检查
|
||||
@echo "🧪 运行Kali Linux快速健康检查..."
|
||||
@cd configuration && ansible-playbook -i inventories/production/inventory.ini playbooks/test/kali-health-check.yml
|
||||
|
||||
test-kali-security: ## 运行Kali Linux安全工具测试
|
||||
@echo "🧪 运行Kali Linux安全工具测试..."
|
||||
@cd configuration && ansible-playbook -i inventories/production/inventory.ini playbooks/test/kali-security-tools.yml
|
||||
|
||||
test-kali-full: ## 运行Kali Linux完整测试套件
|
||||
@echo "🧪 运行Kali Linux完整测试套件..."
|
||||
@cd configuration && ansible-playbook playbooks/test/kali-full-test-suite.yml
|
||||
|
||||
lint: ## 代码检查
|
||||
@echo "🔍 代码检查..."
|
||||
@bash scripts/ci-cd/quality/lint.sh
|
||||
|
||||
# 文档
|
||||
docs: ## 生成文档
|
||||
@echo "📚 生成文档..."
|
||||
@bash scripts/ci-cd/build/generate-docs.sh
|
||||
|
||||
# 清理
|
||||
clean: ## 清理临时文件
|
||||
@echo "🧹 清理临时文件..."
|
||||
@find . -name "*.tfstate*" -delete
|
||||
@find . -name ".terraform" -type d -exec rm -rf {} + 2>/dev/null || true
|
||||
@podman system prune -f
|
||||
|
||||
# 备份
|
||||
backup: ## 创建备份
|
||||
@echo "💾 创建备份..."
|
||||
@bash scripts/utilities/backup/backup-all.sh
|
||||
|
||||
# 监控
|
||||
monitor: ## 启动监控
|
||||
@echo "📊 启动监控..."
|
||||
@podman-compose -f containers/compose/production/monitoring.yml up -d
|
||||
|
||||
# 安全扫描
|
||||
security-scan: ## 安全扫描
|
||||
@echo "🔒 安全扫描..."
|
||||
@bash scripts/ci-cd/quality/security-scan.sh
|
||||
|
|
@ -0,0 +1,20 @@
|
|||
[defaults]
|
||||
inventory = inventory.ini
|
||||
host_key_checking = False
|
||||
forks = 8
|
||||
timeout = 30
|
||||
gathering = smart
|
||||
fact_caching = memory
|
||||
# 支持新的 playbooks 目录结构
|
||||
roles_path = playbooks/
|
||||
collections_path = playbooks/
|
||||
# 启用SSH密钥认证
|
||||
ansible_ssh_common_args = '-o PreferredAuthentications=publickey -o PubkeyAuthentication=yes'
|
||||
|
||||
[ssh_connection]
|
||||
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -o PubkeyAuthentication=yes
|
||||
pipelining = True
|
||||
|
||||
[inventory]
|
||||
# 启用插件以支持动态 inventory
|
||||
enable_plugins = host_list, script, auto, yaml, ini, toml
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
---
|
||||
- name: Clean up Consul configuration from dedicated clients
|
||||
hosts: hcp1,influxdb1,browser
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: Stop Consul service
|
||||
systemd:
|
||||
name: consul
|
||||
state: stopped
|
||||
enabled: no
|
||||
|
||||
- name: Disable Consul service
|
||||
systemd:
|
||||
name: consul
|
||||
enabled: no
|
||||
|
||||
- name: Kill any remaining Consul processes
|
||||
shell: |
|
||||
pkill -f consul || true
|
||||
sleep 2
|
||||
pkill -9 -f consul || true
|
||||
ignore_errors: yes
|
||||
|
||||
- name: Remove Consul systemd service file
|
||||
file:
|
||||
path: /etc/systemd/system/consul.service
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul configuration directory
|
||||
file:
|
||||
path: /etc/consul.d
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul data directory
|
||||
file:
|
||||
path: /opt/consul
|
||||
state: absent
|
||||
|
||||
- name: Reload systemd daemon
|
||||
systemd:
|
||||
daemon_reload: yes
|
||||
|
||||
- name: Verify Consul is stopped
|
||||
shell: |
|
||||
if pgrep -f consul; then
|
||||
echo "Consul still running"
|
||||
exit 1
|
||||
else
|
||||
echo "Consul stopped successfully"
|
||||
fi
|
||||
register: consul_status
|
||||
failed_when: consul_status.rc != 0
|
||||
|
||||
- name: Display cleanup status
|
||||
debug:
|
||||
msg: "Consul cleanup completed on {{ inventory_hostname }}"
|
||||
|
|
@ -0,0 +1,55 @@
|
|||
---
|
||||
- name: Configure Consul Auto-Discovery
|
||||
hosts: all
|
||||
become: yes
|
||||
vars:
|
||||
consul_servers:
|
||||
- "warden.tailnet-68f9.ts.net:8301"
|
||||
- "ch4.tailnet-68f9.ts.net:8301"
|
||||
- "ash3c.tailnet-68f9.ts.net:8301"
|
||||
|
||||
tasks:
|
||||
- name: Backup current nomad.hcl
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.backup.{{ ansible_date_time.epoch }}
|
||||
remote_src: yes
|
||||
backup: yes
|
||||
|
||||
- name: Update Consul configuration for auto-discovery
|
||||
blockinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
marker: "# {mark} ANSIBLE MANAGED CONSUL CONFIG"
|
||||
block: |
|
||||
consul {
|
||||
retry_join = [
|
||||
"warden.tailnet-68f9.ts.net:8301",
|
||||
"ch4.tailnet-68f9.ts.net:8301",
|
||||
"ash3c.tailnet-68f9.ts.net:8301"
|
||||
]
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
}
|
||||
insertbefore: '^consul \{'
|
||||
replace: '^consul \{.*?\}'
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
enabled: yes
|
||||
|
||||
- name: Wait for Nomad to be ready
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ ansible_default_ipv4.address }}"
|
||||
delay: 5
|
||||
timeout: 30
|
||||
|
||||
- name: Verify Consul connection
|
||||
shell: |
|
||||
NOMAD_ADDR=http://localhost:4646 nomad node status | grep -q "ready"
|
||||
register: nomad_ready
|
||||
failed_when: nomad_ready.rc != 0
|
||||
retries: 3
|
||||
delay: 10
|
||||
|
|
@ -0,0 +1,75 @@
|
|||
---
|
||||
- name: Remove Consul configuration from Nomad servers
|
||||
hosts: semaphore,ash1d,ash2e,ch2,ch3,onecloud1,de
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: Remove entire Consul configuration block
|
||||
blockinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
marker: "# {mark} ANSIBLE MANAGED CONSUL CONFIG"
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul configuration lines
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^consul \{'
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul configuration content
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^ address ='
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul service names
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^ server_service_name ='
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul client service name
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^ client_service_name ='
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul auto-advertise
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^ auto_advertise ='
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul server auto-join
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^ server_auto_join ='
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul client auto-join
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^ client_auto_join ='
|
||||
state: absent
|
||||
|
||||
- name: Remove Consul closing brace
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^}'
|
||||
state: absent
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
||||
- name: Wait for Nomad to be ready
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ ansible_default_ipv4.address }}"
|
||||
delay: 5
|
||||
timeout: 30
|
||||
|
||||
- name: Display completion message
|
||||
debug:
|
||||
msg: "Removed Consul configuration from {{ inventory_hostname }}"
|
||||
|
|
@ -0,0 +1,32 @@
|
|||
---
|
||||
- name: Enable Nomad Client Mode on Servers
|
||||
hosts: ch2,ch3,de
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: Enable Nomad client mode
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^client \{'
|
||||
line: 'client {'
|
||||
state: present
|
||||
|
||||
- name: Enable client mode
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^ enabled = false'
|
||||
line: ' enabled = true'
|
||||
state: present
|
||||
|
||||
- name: Restart Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
|
||||
- name: Wait for Nomad to be ready
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ ansible_default_ipv4.address }}"
|
||||
delay: 5
|
||||
timeout: 30
|
||||
|
||||
|
|
@ -0,0 +1,38 @@
|
|||
client {
|
||||
enabled = true
|
||||
# 配置七姐妹服务器地址
|
||||
servers = [
|
||||
"100.116.158.95:4647", # bj-semaphore
|
||||
"100.81.26.3:4647", # ash1d
|
||||
"100.103.147.94:4647", # ash2e
|
||||
"100.90.159.68:4647", # ch2
|
||||
"100.86.141.112:4647", # ch3
|
||||
"100.98.209.50:4647", # bj-onecloud1
|
||||
"100.120.225.29:4647" # de
|
||||
]
|
||||
host_volume "fnsync" {
|
||||
path = "/mnt/fnsync"
|
||||
read_only = false
|
||||
}
|
||||
# 禁用Docker驱动,只使用Podman
|
||||
options {
|
||||
"driver.raw_exec.enable" = "1"
|
||||
"driver.exec.enable" = "1"
|
||||
}
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
}
|
||||
|
||||
# 配置Podman驱动
|
||||
plugin "podman" {
|
||||
config {
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
logging {
|
||||
type = "journald"
|
||||
}
|
||||
gc {
|
||||
container = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
---
|
||||
- name: Fix all master references to ch4
|
||||
hosts: localhost
|
||||
gather_facts: no
|
||||
vars:
|
||||
files_to_fix:
|
||||
- "scripts/diagnose-consul-sync.sh"
|
||||
- "scripts/register-traefik-to-all-consul.sh"
|
||||
- "deployment/ansible/playbooks/update-nomad-consul-config.yml"
|
||||
- "deployment/ansible/templates/nomad-server.hcl.j2"
|
||||
- "deployment/ansible/templates/nomad-client.hcl"
|
||||
- "deployment/ansible/playbooks/fix-nomad-consul-roles.yml"
|
||||
- "deployment/ansible/onecloud1_nomad.hcl"
|
||||
- "ansible/templates/consul-client.hcl.j2"
|
||||
- "ansible/consul-client-deployment.yml"
|
||||
- "ansible/consul-client-simple.yml"
|
||||
|
||||
tasks:
|
||||
- name: Replace master.tailnet-68f9.ts.net with ch4.tailnet-68f9.ts.net
|
||||
replace:
|
||||
path: "{{ item }}"
|
||||
regexp: 'master\.tailnet-68f9\.ts\.net'
|
||||
replace: 'ch4.tailnet-68f9.ts.net'
|
||||
loop: "{{ files_to_fix }}"
|
||||
when: item is file
|
||||
|
||||
- name: Replace master hostname references
|
||||
replace:
|
||||
path: "{{ item }}"
|
||||
regexp: '\bmaster\b'
|
||||
replace: 'ch4'
|
||||
loop: "{{ files_to_fix }}"
|
||||
when: item is file
|
||||
|
||||
- name: Replace master IP references in comments
|
||||
replace:
|
||||
path: "{{ item }}"
|
||||
regexp: '# master'
|
||||
replace: '# ch4'
|
||||
loop: "{{ files_to_fix }}"
|
||||
when: item is file
|
||||
|
||||
- name: Fix inventory files
|
||||
replace:
|
||||
path: "{{ item }}"
|
||||
regexp: 'master ansible_host=master'
|
||||
replace: 'ch4 ansible_host=ch4'
|
||||
loop:
|
||||
- "deployment/ansible/inventories/production/inventory.ini"
|
||||
- "deployment/ansible/inventories/production/csol-consul-nodes.ini"
|
||||
- "deployment/ansible/inventories/production/nomad-clients.ini"
|
||||
- "deployment/ansible/inventories/production/master-ash3c.ini"
|
||||
- "deployment/ansible/inventories/production/consul-nodes.ini"
|
||||
- "deployment/ansible/inventories/production/vault.ini"
|
||||
|
||||
- name: Fix IP address references (100.117.106.136 comments)
|
||||
replace:
|
||||
path: "{{ item }}"
|
||||
regexp: '100\.117\.106\.136.*# master'
|
||||
replace: '100.117.106.136 # ch4'
|
||||
loop: "{{ files_to_fix }}"
|
||||
when: item is file
|
||||
|
|
@ -0,0 +1,2 @@
|
|||
ansible_ssh_pass: "3131"
|
||||
ansible_become_pass: "3131"
|
||||
|
|
@ -0,0 +1,108 @@
|
|||
# CSOL Consul 静态节点配置说明
|
||||
|
||||
## 概述
|
||||
|
||||
本目录包含CSOL(Cloud Service Operations Layer)的Consul静态节点配置文件。这些配置文件定义了Consul集群的服务器和客户端节点信息,便于团队成员快速了解和使用Consul集群。
|
||||
|
||||
## 配置文件说明
|
||||
|
||||
### 1. csol-consul-nodes.ini
|
||||
这是主要的Consul节点配置文件,包含所有服务器和客户端节点的详细信息。
|
||||
|
||||
**文件结构:**
|
||||
- `[consul_servers]` - Consul服务器节点(7个节点)
|
||||
- `[consul_clients]` - Consul客户端节点(2个节点)
|
||||
- `[consul_cluster:children]` - 集群所有节点的组合
|
||||
- `[consul_servers:vars]` - 服务器节点的通用配置
|
||||
- `[consul_clients:vars]` - 客户端节点的通用配置
|
||||
- `[consul_cluster:vars]` - 整个集群的通用配置
|
||||
|
||||
**使用方法:**
|
||||
```bash
|
||||
# 使用此配置文件运行Ansible Playbook
|
||||
ansible-playbook -i csol-consul-nodes.ini your-playbook.yml
|
||||
```
|
||||
|
||||
### 2. csol-consul-nodes.json
|
||||
这是JSON格式的Consul节点配置文件,便于程序读取和处理。
|
||||
|
||||
**文件结构:**
|
||||
- `servers` - 服务器节点列表
|
||||
- `clients` - 客户端节点列表
|
||||
- `configuration` - 集群配置信息
|
||||
- `notes` - 节点统计和备注信息
|
||||
|
||||
**使用方法:**
|
||||
```bash
|
||||
# 使用jq工具查询JSON文件
|
||||
jq '.csol_consul_nodes.servers.nodes[].name' csol-consul-nodes.json
|
||||
|
||||
# 使用Python脚本处理JSON文件
|
||||
python3 -c "import json; data=json.load(open('csol-consul-nodes.json')); print(data['csol_consul_nodes']['servers']['nodes'])"
|
||||
```
|
||||
|
||||
### 3. consul-nodes.ini
|
||||
这是更新的Consul节点配置文件,替代了原有的旧版本。
|
||||
|
||||
### 4. consul-cluster.ini
|
||||
这是Consul集群服务器节点的配置文件,主要用于集群部署和管理。
|
||||
|
||||
## 节点列表
|
||||
|
||||
### 服务器节点(7个)
|
||||
|
||||
| 节点名称 | IP地址 | 区域 | 角色 |
|
||||
|---------|--------|------|------|
|
||||
| ch2 | 100.90.159.68 | Oracle Cloud KR | 服务器 |
|
||||
| ch3 | 100.86.141.112 | Oracle Cloud KR | 服务器 |
|
||||
| ash1d | 100.81.26.3 | Oracle Cloud US | 服务器 |
|
||||
| ash2e | 100.103.147.94 | Oracle Cloud US | 服务器 |
|
||||
| onecloud1 | 100.98.209.50 | Armbian | 服务器 |
|
||||
| de | 100.120.225.29 | Armbian | 服务器 |
|
||||
| bj-semaphore | 100.116.158.95 | Semaphore | 服务器 |
|
||||
|
||||
### 客户端节点(2个)
|
||||
|
||||
| 节点名称 | IP地址 | 端口 | 区域 | 角色 |
|
||||
|---------|--------|------|------|------|
|
||||
| master | 100.117.106.136 | 60022 | Oracle Cloud A1 | 客户端 |
|
||||
| ash3c | 100.116.80.94 | - | Oracle Cloud A1 | 客户端 |
|
||||
|
||||
## 配置参数
|
||||
|
||||
### 通用配置
|
||||
- `consul_version`: 1.21.5
|
||||
- `datacenter`: dc1
|
||||
- `encrypt_key`: 1EvGItLOB8nuHnSA0o+rO0zXzLeJl+U+Jfvuw0+H848=
|
||||
- `client_addr`: 0.0.0.0
|
||||
- `data_dir`: /opt/consul/data
|
||||
- `config_dir`: /etc/consul.d
|
||||
- `log_level`: INFO
|
||||
- `port`: 8500
|
||||
|
||||
### 服务器特定配置
|
||||
- `consul_server`: true
|
||||
- `bootstrap_expect`: 7
|
||||
- `ui_config`: true
|
||||
|
||||
### 客户端特定配置
|
||||
- `consul_server`: false
|
||||
|
||||
## 注意事项
|
||||
|
||||
1. **退役节点**:hcs节点已于2025-09-27退役,不再包含在配置中。
|
||||
2. **故障节点**:syd节点为故障节点,已隔离,不包含在配置中。
|
||||
3. **端口配置**:master节点使用60022端口,其他节点使用默认SSH端口。
|
||||
4. **认证信息**:所有节点使用统一的认证信息(用户名:ben,密码:3131)。
|
||||
5. **bootstrap_expect**:设置为7,表示期望有7个服务器节点形成集群。
|
||||
|
||||
## 更新日志
|
||||
|
||||
- 2025-06-17:初始版本,包含完整的CSOL Consul节点配置。
|
||||
|
||||
## 维护说明
|
||||
|
||||
1. 添加新节点时,请同时更新所有配置文件。
|
||||
2. 节点退役或故障时,请及时从配置中移除并更新说明。
|
||||
3. 定期验证节点可达性和配置正确性。
|
||||
4. 更新配置后,请同步更新此README文件。
|
||||
|
|
@ -0,0 +1,47 @@
|
|||
# CSOL Consul 集群 Inventory - 更新时间: 2025-06-17
|
||||
# 此文件包含所有CSOL的Consul服务器节点信息
|
||||
|
||||
[consul_servers]
|
||||
# Oracle Cloud 韩国区域 (KR)
|
||||
ch2 ansible_host=100.90.159.68 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch3 ansible_host=100.86.141.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
# Oracle Cloud 美国区域 (US)
|
||||
ash1d ansible_host=100.81.26.3 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ash2e ansible_host=100.103.147.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
# Armbian 节点
|
||||
onecloud1 ansible_host=100.98.209.50 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
de ansible_host=100.120.225.29 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
# Semaphore 节点
|
||||
bj-semaphore ansible_host=100.116.158.95 ansible_user=root
|
||||
|
||||
[consul_cluster:children]
|
||||
consul_servers
|
||||
|
||||
[consul_servers:vars]
|
||||
# Consul服务器配置
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
consul_version=1.21.5
|
||||
consul_datacenter=dc1
|
||||
consul_encrypt_key=1EvGItLOB8nuHnSA0o+rO0zXzLeJl+U+Jfvuw0+H848=
|
||||
consul_bootstrap_expect=7
|
||||
consul_server=true
|
||||
consul_ui_config=true
|
||||
consul_client_addr=0.0.0.0
|
||||
consul_bind_addr="{{ ansible_default_ipv4.address }}"
|
||||
consul_data_dir=/opt/consul/data
|
||||
consul_config_dir=/etc/consul.d
|
||||
consul_log_level=INFO
|
||||
consul_port=8500
|
||||
|
||||
# === 节点说明 ===
|
||||
# 服务器节点 (7个):
|
||||
# - Oracle Cloud KR: ch2, ch3
|
||||
# - Oracle Cloud US: ash1d, ash2e
|
||||
# - Armbian: onecloud1, de
|
||||
# - Semaphore: bj-semaphore
|
||||
#
|
||||
# 注意: hcs节点已退役 (2025-09-27)
|
||||
# 注意: syd节点为故障节点,已隔离
|
||||
|
|
@ -0,0 +1,65 @@
|
|||
# CSOL Consul 静态节点配置
|
||||
# 更新时间: 2025-06-17 (基于实际Consul集群信息更新)
|
||||
# 此文件包含所有CSOL的服务器和客户端节点信息
|
||||
|
||||
[consul_servers]
|
||||
# 主要服务器节点 (全部为服务器模式)
|
||||
master ansible_host=100.117.106.136 ansible_user=ben ansible_password=3131 ansible_become_password=3131 ansible_port=60022
|
||||
ash3c ansible_host=100.116.80.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
warden ansible_host=100.122.197.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[consul_clients]
|
||||
# 客户端节点
|
||||
bj-warden ansible_host=100.122.197.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
bj-hcp2 ansible_host=100.116.112.45 ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
bj-influxdb ansible_host=100.100.7.4 ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
bj-hcp1 ansible_host=100.97.62.111 ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
|
||||
[consul_cluster:children]
|
||||
consul_servers
|
||||
consul_clients
|
||||
|
||||
[consul_servers:vars]
|
||||
# Consul服务器配置
|
||||
consul_server=true
|
||||
consul_bootstrap_expect=3
|
||||
consul_datacenter=dc1
|
||||
consul_encrypt_key=1EvGItLOB8nuHnSA0o+rO0zXzLeJl+U+Jfvuw0+H848=
|
||||
consul_client_addr=0.0.0.0
|
||||
consul_bind_addr="{{ ansible_default_ipv4.address }}"
|
||||
consul_data_dir=/opt/consul/data
|
||||
consul_config_dir=/etc/consul.d
|
||||
consul_log_level=INFO
|
||||
consul_port=8500
|
||||
consul_ui_config=true
|
||||
|
||||
[consul_clients:vars]
|
||||
# Consul客户端配置
|
||||
consul_server=false
|
||||
consul_datacenter=dc1
|
||||
consul_encrypt_key=1EvGItLOB8nuHnSA0o+rO0zXzLeJl+U+Jfvuw0+H848=
|
||||
consul_client_addr=0.0.0.0
|
||||
consul_bind_addr="{{ ansible_default_ipv4.address }}"
|
||||
consul_data_dir=/opt/consul/data
|
||||
consul_config_dir=/etc/consul.d
|
||||
consul_log_level=INFO
|
||||
|
||||
[consul_cluster:vars]
|
||||
# 通用配置
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
ansible_ssh_private_key_file=~/.ssh/id_ed25519
|
||||
consul_version=1.21.5
|
||||
|
||||
# === 节点说明 ===
|
||||
# 服务器节点 (3个):
|
||||
# - bj-semaphore: 100.116.158.95 (主要服务器节点)
|
||||
# - kr-master: 100.117.106.136 (韩国主节点)
|
||||
# - us-ash3c: 100.116.80.94 (美国服务器节点)
|
||||
#
|
||||
# 客户端节点 (4个):
|
||||
# - bj-warden: 100.122.197.112 (北京客户端节点)
|
||||
# - bj-hcp2: 100.116.112.45 (北京HCP客户端节点2)
|
||||
# - bj-influxdb: 100.100.7.4 (北京InfluxDB客户端节点)
|
||||
# - bj-hcp1: 100.97.62.111 (北京HCP客户端节点1)
|
||||
#
|
||||
# 注意: 此配置基于实际Consul集群信息更新,包含3个服务器节点
|
||||
|
|
@ -0,0 +1,44 @@
|
|||
# Consul 静态节点配置
|
||||
# 此文件包含所有CSOL的服务器和客户端节点信息
|
||||
# 更新时间: 2025-06-17 (基于实际Consul集群信息更新)
|
||||
|
||||
# === CSOL 服务器节点 ===
|
||||
# 这些节点运行Consul服务器模式,参与集群决策和数据存储
|
||||
|
||||
[consul_servers]
|
||||
# 主要服务器节点 (全部为服务器模式)
|
||||
master ansible_host=100.117.106.136 ansible_user=ben ansible_password=3131 ansible_become_password=3131 ansible_port=60022
|
||||
ash3c ansible_host=100.116.80.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
warden ansible_host=100.122.197.112 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
# === 节点分组 ===
|
||||
|
||||
[consul_cluster:children]
|
||||
consul_servers
|
||||
|
||||
[consul_servers:vars]
|
||||
# Consul服务器配置
|
||||
consul_server=true
|
||||
consul_bootstrap_expect=3
|
||||
consul_datacenter=dc1
|
||||
consul_encrypt_key=1EvGItLOB8nuHnSA0o+rO0zXzLeJl+U+Jfvuw0+H848=
|
||||
consul_client_addr=0.0.0.0
|
||||
consul_bind_addr="{{ ansible_default_ipv4.address }}"
|
||||
consul_data_dir=/opt/consul/data
|
||||
consul_config_dir=/etc/consul.d
|
||||
consul_log_level=INFO
|
||||
consul_port=8500
|
||||
consul_ui_config=true
|
||||
|
||||
[consul_cluster:vars]
|
||||
# 通用配置
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
consul_version=1.21.5
|
||||
|
||||
# === 节点说明 ===
|
||||
# 服务器节点 (3个):
|
||||
# - master: 100.117.106.136 (韩国主节点)
|
||||
# - ash3c: 100.116.80.94 (美国服务器节点)
|
||||
# - warden: 100.122.197.112 (北京服务器节点,当前集群leader)
|
||||
#
|
||||
# 注意: 此配置基于实际Consul集群信息更新,所有节点均为服务器模式
|
||||
|
|
@ -0,0 +1,126 @@
|
|||
{
|
||||
"csol_consul_nodes": {
|
||||
"updated_at": "2025-06-17",
|
||||
"description": "CSOL Consul静态节点配置",
|
||||
"servers": {
|
||||
"description": "Consul服务器节点,参与集群决策和数据存储",
|
||||
"nodes": [
|
||||
{
|
||||
"name": "ch2",
|
||||
"host": "100.90.159.68",
|
||||
"user": "ben",
|
||||
"password": "3131",
|
||||
"become_password": "3131",
|
||||
"region": "Oracle Cloud KR",
|
||||
"role": "server"
|
||||
},
|
||||
{
|
||||
"name": "ch3",
|
||||
"host": "100.86.141.112",
|
||||
"user": "ben",
|
||||
"password": "3131",
|
||||
"become_password": "3131",
|
||||
"region": "Oracle Cloud KR",
|
||||
"role": "server"
|
||||
},
|
||||
{
|
||||
"name": "ash1d",
|
||||
"host": "100.81.26.3",
|
||||
"user": "ben",
|
||||
"password": "3131",
|
||||
"become_password": "3131",
|
||||
"region": "Oracle Cloud US",
|
||||
"role": "server"
|
||||
},
|
||||
{
|
||||
"name": "ash2e",
|
||||
"host": "100.103.147.94",
|
||||
"user": "ben",
|
||||
"password": "3131",
|
||||
"become_password": "3131",
|
||||
"region": "Oracle Cloud US",
|
||||
"role": "server"
|
||||
},
|
||||
{
|
||||
"name": "onecloud1",
|
||||
"host": "100.98.209.50",
|
||||
"user": "ben",
|
||||
"password": "3131",
|
||||
"become_password": "3131",
|
||||
"region": "Armbian",
|
||||
"role": "server"
|
||||
},
|
||||
{
|
||||
"name": "de",
|
||||
"host": "100.120.225.29",
|
||||
"user": "ben",
|
||||
"password": "3131",
|
||||
"become_password": "3131",
|
||||
"region": "Armbian",
|
||||
"role": "server"
|
||||
},
|
||||
{
|
||||
"name": "bj-semaphore",
|
||||
"host": "100.116.158.95",
|
||||
"user": "root",
|
||||
"region": "Semaphore",
|
||||
"role": "server"
|
||||
}
|
||||
]
|
||||
},
|
||||
"clients": {
|
||||
"description": "Consul客户端节点,用于服务发现和健康检查",
|
||||
"nodes": [
|
||||
{
|
||||
"name": "ch4",
|
||||
"host": "100.117.106.136",
|
||||
"user": "ben",
|
||||
"password": "3131",
|
||||
"become_password": "3131",
|
||||
"port": 60022,
|
||||
"region": "Oracle Cloud A1",
|
||||
"role": "client"
|
||||
},
|
||||
{
|
||||
"name": "ash3c",
|
||||
"host": "100.116.80.94",
|
||||
"user": "ben",
|
||||
"password": "3131",
|
||||
"become_password": "3131",
|
||||
"region": "Oracle Cloud A1",
|
||||
"role": "client"
|
||||
}
|
||||
]
|
||||
},
|
||||
"configuration": {
|
||||
"consul_version": "1.21.5",
|
||||
"datacenter": "dc1",
|
||||
"encrypt_key": "1EvGItLOB8nuHnSA0o+rO0zXzLeJl+U+Jfvuw0+H848=",
|
||||
"client_addr": "0.0.0.0",
|
||||
"data_dir": "/opt/consul/data",
|
||||
"config_dir": "/etc/consul.d",
|
||||
"log_level": "INFO",
|
||||
"port": 8500,
|
||||
"bootstrap_expect": 7,
|
||||
"ui_config": true
|
||||
},
|
||||
"notes": {
|
||||
"server_count": 7,
|
||||
"client_count": 2,
|
||||
"total_nodes": 9,
|
||||
"retired_nodes": [
|
||||
{
|
||||
"name": "hcs",
|
||||
"retired_date": "2025-09-27",
|
||||
"reason": "节点退役"
|
||||
}
|
||||
],
|
||||
"isolated_nodes": [
|
||||
{
|
||||
"name": "syd",
|
||||
"reason": "故障节点,已隔离"
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
@ -0,0 +1,20 @@
|
|||
# Nomad 集群全局配置
|
||||
# InfluxDB 2.x + Grafana 监控配置
|
||||
|
||||
# InfluxDB 2.x 连接配置
|
||||
influxdb_url: "http://influxdb1.tailnet-68f9.ts.net:8086"
|
||||
influxdb_token: "VU_dOCVZzqEHb9jSFsDe0bJlEBaVbiG4LqfoczlnmcbfrbmklSt904HJPL4idYGvVi0c2eHkYDi2zCTni7Ay4w=="
|
||||
influxdb_org: "seekkey" # 组织名称
|
||||
influxdb_bucket: "VPS" # Bucket 名称
|
||||
|
||||
# 远程 Telegraf 配置 URL
|
||||
telegraf_config_url: "http://influxdb1.tailnet-68f9.ts.net:8086/api/v2/telegrafs/0f8a73496790c000"
|
||||
|
||||
# 监控配置
|
||||
disk_usage_warning: 80 # 硬盘使用率警告阈值
|
||||
disk_usage_critical: 90 # 硬盘使用率严重告警阈值
|
||||
collection_interval: 30 # 数据收集间隔(秒)
|
||||
|
||||
# Telegraf 优化配置
|
||||
telegraf_log_level: "ERROR" # 只记录错误日志
|
||||
telegraf_disable_local_logs: true # 禁用本地日志文件
|
||||
|
|
@ -0,0 +1,37 @@
|
|||
[nomad_servers]
|
||||
# 服务器节点 (7个服务器节点)
|
||||
# ⚠️ 警告:能力越大,责任越大!服务器节点操作需极其谨慎!
|
||||
# ⚠️ 任何对服务器节点的操作都可能影响整个集群的稳定性!
|
||||
semaphore ansible_host=127.0.0.1 ansible_user=root ansible_password=3131 ansible_become_password=3131 ansible_ssh_common_args="-o PreferredAuthentications=password -o PubkeyAuthentication=no"
|
||||
ash1d ansible_host=ash1d.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ash2e ansible_host=ash2e.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch2 ansible_host=ch2.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ch3 ansible_host=ch3.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
onecloud1 ansible_host=onecloud1.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
de ansible_host=de.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
hcp1 ansible_host=hcp1.tailnet-68f9.ts.net ansible_user=root ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[nomad_clients]
|
||||
# 客户端节点 (5个客户端节点)
|
||||
ch4 ansible_host=ch4.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
ash3c ansible_host=ash3c.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
browser ansible_host=browser.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
influxdb1 ansible_host=influxdb1.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
warden ansible_host=warden.tailnet-68f9.ts.net ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[nomad_nodes:children]
|
||||
nomad_servers
|
||||
nomad_clients
|
||||
|
||||
[nomad_nodes:vars]
|
||||
# NFS配置
|
||||
nfs_server=snail
|
||||
nfs_share=/fs/1000/nfs/Fnsync
|
||||
mount_point=/mnt/fnsync
|
||||
|
||||
# Ansible配置
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
gitea ansible_host=gitea ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[gitea]
|
||||
gitea ansible_host=gitea ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
|
@ -0,0 +1,98 @@
|
|||
[dev]
|
||||
dev1 ansible_host=dev1 ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
dev2 ansible_host=dev2 ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
|
||||
[oci_kr]
|
||||
#ch2 ansible_host=ch2 ansible_user=ben ansible_become=yes ansible_become_pass=3131 # 过期节点,已移除 (2025-09-30)
|
||||
#ch3 ansible_host=ch3 ansible_user=ben ansible_become=yes ansible_become_pass=3131 # 过期节点,已移除 (2025-09-30)
|
||||
|
||||
[oci_us]
|
||||
ash1d ansible_host=ash1d ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
ash2e ansible_host=ash2e ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
|
||||
[oci_a1]
|
||||
ch4 ansible_host=ch4 ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
ash3c ansible_host=ash3c ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
|
||||
|
||||
[huawei]
|
||||
# hcs 节点已退役 (2025-09-27)
|
||||
[google]
|
||||
benwork ansible_host=benwork ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
|
||||
[ditigalocean]
|
||||
# syd ansible_host=syd ansible_user=ben ansible_become=yes ansible_become_pass=3131 # 故障节点,已隔离
|
||||
|
||||
[faulty_cloud_servers]
|
||||
# 故障的云服务器节点,需要通过 OpenTofu 和 Consul 解决
|
||||
# hcs 节点已退役 (2025-09-27)
|
||||
syd ansible_host=syd ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
|
||||
[aws]
|
||||
#aws linux dnf
|
||||
awsirish ansible_host=awsirish ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
|
||||
[proxmox]
|
||||
pve ansible_host=pve ansible_user=root ansible_become=yes ansible_become_pass=Aa313131@ben
|
||||
xgp ansible_host=xgp ansible_user=root ansible_become=yes ansible_become_pass=Aa313131@ben
|
||||
nuc12 ansible_host=nuc12 ansible_user=root ansible_become=yes ansible_become_pass=Aa313131@ben
|
||||
|
||||
[lxc]
|
||||
#集中在三台机器,不要同时upgrade 会死掉,顺序调度来 (Debian/Ubuntu containers using apt)
|
||||
gitea ansible_host=gitea.tailnet-68f9.ts.net ansible_user=ben ansible_ssh_private_key_file=/root/.ssh/gitea ansible_become=yes ansible_become_pass=3131
|
||||
mysql ansible_host=mysql ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
postgresql ansible_host=postgresql ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
|
||||
[nomadlxc]
|
||||
influxdb ansible_host=influxdb1 ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
warden ansible_host=warden ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
[semaphore]
|
||||
#semaphoressh ansible_host=localhost ansible_user=root ansible_become=yes ansible_become_pass=313131 ansible_ssh_pass=313131 # 过期节点,已移除 (2025-09-30)
|
||||
|
||||
[alpine]
|
||||
#Alpine Linux containers using apk package manager
|
||||
redis ansible_host=redis ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
authentik ansible_host=authentik ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
calibreweb ansible_host=calibreweb ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
qdrant ansible_host=qdrant ansible_user=root ansible_become=yes
|
||||
|
||||
[vm]
|
||||
kali ansible_host=kali ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
|
||||
[hcp]
|
||||
hcp1 ansible_host=hcp1 ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
hcp2 ansible_host=hcp2 ansible_user=root ansible_become=yes ansible_become_pass=313131
|
||||
|
||||
[feiniu]
|
||||
snail ansible_host=snail ansible_user=houzhongxu ansible_ssh_pass=Aa313131@ben ansible_become=yes ansible_become_pass=Aa313131@ben
|
||||
|
||||
[armbian]
|
||||
onecloud1 ansible_host=100.98.209.50 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
de ansible_host=100.120.225.29 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[beijing:children]
|
||||
nomadlxc
|
||||
hcp
|
||||
|
||||
[all:vars]
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
|
||||
[nomad_clients:children]
|
||||
nomadlxc
|
||||
hcp
|
||||
oci_a1
|
||||
huawei
|
||||
ditigalocean
|
||||
[nomad_servers:children]
|
||||
oci_us
|
||||
oci_kr
|
||||
semaphore
|
||||
armbian
|
||||
|
||||
[nomad_cluster:children]
|
||||
nomad_servers
|
||||
nomad_clients
|
||||
|
||||
[beijing:children]
|
||||
nomadlxc
|
||||
hcp
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
[target_nodes]
|
||||
master ansible_host=100.117.106.136 ansible_port=60022 ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
ash3c ansible_host=100.116.80.94 ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
semaphore ansible_host=100.116.158.95 ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
|
||||
[target_nodes:vars]
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
|
|
@ -0,0 +1,14 @@
|
|||
# Nomad 客户端节点配置
|
||||
# 此文件包含需要配置为Nomad客户端的6个节点
|
||||
|
||||
[nomad_clients]
|
||||
bj-hcp1 ansible_host=bj-hcp1 ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
bj-influxdb ansible_host=bj-influxdb ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
bj-warden ansible_host=bj-warden ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
bj-hcp2 ansible_host=bj-hcp2 ansible_user=root ansible_password=313131 ansible_become_password=313131
|
||||
kr-master ansible_host=master ansible_port=60022 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
us-ash3c ansible_host=ash3c ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
|
||||
[nomad_clients:vars]
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
client_ip="{{ ansible_host }}"
|
||||
|
|
@ -0,0 +1,12 @@
|
|||
[consul_servers:children]
|
||||
nomad_servers
|
||||
|
||||
[consul_servers:vars]
|
||||
consul_cert_dir=/etc/consul.d/certs
|
||||
consul_ca_src=security/certificates/ca.pem
|
||||
consul_cert_src=security/certificates/consul-server.pem
|
||||
consul_key_src=security/certificates/consul-server-key.pem
|
||||
|
||||
[nomad_cluster:children]
|
||||
nomad_servers
|
||||
nomad_clients
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
[vault_servers]
|
||||
master ansible_host=100.117.106.136 ansible_user=ben ansible_password=3131 ansible_become_password=3131 ansible_port=60022
|
||||
ash3c ansible_host=100.116.80.94 ansible_user=ben ansible_password=3131 ansible_become_password=3131
|
||||
warden ansible_host=warden ansible_user=ben ansible_become=yes ansible_become_pass=3131
|
||||
|
||||
[vault_servers:vars]
|
||||
ansible_ssh_common_args='-o StrictHostKeyChecking=no'
|
||||
|
|
@ -0,0 +1,50 @@
|
|||
datacenter = "dc1"
|
||||
data_dir = "/opt/nomad/data"
|
||||
plugin_dir = "/opt/nomad/plugins"
|
||||
log_level = "INFO"
|
||||
name = "onecloud1"
|
||||
|
||||
bind_addr = "100.98.209.50"
|
||||
|
||||
addresses {
|
||||
http = "100.98.209.50"
|
||||
rpc = "100.98.209.50"
|
||||
serf = "100.98.209.50"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
retry_join = ["100.81.26.3", "100.103.147.94", "100.90.159.68", "100.86.141.112", "100.98.209.50", "100.120.225.29"]
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "nomad-driver-podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "100.117.106.136:8500,100.116.80.94:8500,100.122.197.112:8500" # master, ash3c, warden
|
||||
}
|
||||
|
||||
vault {
|
||||
enabled = true
|
||||
address = "http://100.117.106.136:8200,http://100.116.80.94:8200,http://100.122.197.112:8200" # master, ash3c, warden
|
||||
token = "hvs.A5Fu4E1oHyezJapVllKPFsWg"
|
||||
create_from_role = "nomad-cluster"
|
||||
tls_skip_verify = true
|
||||
}
|
||||
|
|
@ -0,0 +1,202 @@
|
|||
---
|
||||
- name: Add Warden Server as Nomad Client to Cluster
|
||||
hosts: warden
|
||||
become: yes
|
||||
gather_facts: yes
|
||||
|
||||
vars:
|
||||
nomad_plugin_dir: "/opt/nomad/plugins"
|
||||
nomad_datacenter: "dc1"
|
||||
nomad_region: "global"
|
||||
nomad_servers:
|
||||
- "100.117.106.136:4647"
|
||||
- "100.116.80.94:4647"
|
||||
- "100.97.62.111:4647"
|
||||
- "100.116.112.45:4647"
|
||||
- "100.84.197.26:4647"
|
||||
|
||||
tasks:
|
||||
- name: 显示当前处理的节点
|
||||
debug:
|
||||
msg: "🔧 将 warden 服务器添加为 Nomad 客户端: {{ inventory_hostname }}"
|
||||
|
||||
- name: 检查 Nomad 是否已安装
|
||||
shell: which nomad || echo "not_found"
|
||||
register: nomad_check
|
||||
changed_when: false
|
||||
|
||||
- name: 下载并安装 Nomad
|
||||
block:
|
||||
- name: 下载 Nomad 1.10.5
|
||||
get_url:
|
||||
url: "https://releases.hashicorp.com/nomad/1.10.5/nomad_1.10.5_linux_amd64.zip"
|
||||
dest: "/tmp/nomad.zip"
|
||||
mode: '0644'
|
||||
|
||||
- name: 解压并安装 Nomad
|
||||
unarchive:
|
||||
src: "/tmp/nomad.zip"
|
||||
dest: "/usr/local/bin/"
|
||||
remote_src: yes
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 清理临时文件
|
||||
file:
|
||||
path: "/tmp/nomad.zip"
|
||||
state: absent
|
||||
when: nomad_check.stdout == "not_found"
|
||||
|
||||
- name: 验证 Nomad 安装
|
||||
shell: nomad version
|
||||
register: nomad_version_output
|
||||
|
||||
- name: 创建 Nomad 配置目录
|
||||
file:
|
||||
path: /etc/nomad.d
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 创建 Nomad 数据目录
|
||||
file:
|
||||
path: /opt/nomad/data
|
||||
state: directory
|
||||
owner: nomad
|
||||
group: nomad
|
||||
mode: '0755'
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 创建 Nomad 插件目录
|
||||
file:
|
||||
path: "{{ nomad_plugin_dir }}"
|
||||
state: directory
|
||||
owner: nomad
|
||||
group: nomad
|
||||
mode: '0755'
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 获取服务器 IP 地址
|
||||
shell: |
|
||||
ip route get 1.1.1.1 | grep -oP 'src \K\S+'
|
||||
register: server_ip_result
|
||||
changed_when: false
|
||||
|
||||
- name: 设置服务器 IP 变量
|
||||
set_fact:
|
||||
server_ip: "{{ server_ip_result.stdout }}"
|
||||
|
||||
- name: 停止 Nomad 服务(如果正在运行)
|
||||
systemd:
|
||||
name: nomad
|
||||
state: stopped
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 创建 Nomad 客户端配置文件
|
||||
copy:
|
||||
content: |
|
||||
# Nomad Client Configuration for warden
|
||||
datacenter = "{{ nomad_datacenter }}"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
bind_addr = "{{ server_ip }}"
|
||||
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
servers = [
|
||||
{% for server in nomad_servers %}"{{ server }}"{% if not loop.last %}, {% endif %}{% endfor %}
|
||||
]
|
||||
}
|
||||
|
||||
plugin_dir = "{{ nomad_plugin_dir }}"
|
||||
|
||||
plugin "podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "127.0.0.1:8500"
|
||||
}
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
|
||||
- name: 验证 Nomad 配置
|
||||
shell: nomad config validate /etc/nomad.d/nomad.hcl
|
||||
register: nomad_validate
|
||||
failed_when: nomad_validate.rc != 0
|
||||
|
||||
- name: 创建 Nomad systemd 服务文件
|
||||
copy:
|
||||
content: |
|
||||
[Unit]
|
||||
Description=Nomad
|
||||
Documentation=https://www.nomadproject.io/docs/
|
||||
Wants=network-online.target
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
User=root
|
||||
Group=root
|
||||
ExecStart=/usr/local/bin/nomad agent -config=/etc/nomad.d
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
KillMode=process
|
||||
KillSignal=SIGINT
|
||||
TimeoutStopSec=5
|
||||
LimitNOFILE=65536
|
||||
LimitNPROC=32768
|
||||
Restart=on-failure
|
||||
RestartSec=2
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
dest: /etc/systemd/system/nomad.service
|
||||
mode: '0644'
|
||||
|
||||
- name: 重新加载 systemd 配置
|
||||
systemd:
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 启动并启用 Nomad 服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: started
|
||||
enabled: yes
|
||||
|
||||
- name: 等待 Nomad 服务启动
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ server_ip }}"
|
||||
delay: 5
|
||||
timeout: 60
|
||||
|
||||
- name: 检查 Nomad 客户端状态
|
||||
shell: nomad node status -self
|
||||
register: nomad_node_status
|
||||
retries: 5
|
||||
delay: 5
|
||||
until: nomad_node_status.rc == 0
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 显示 Nomad 客户端配置结果
|
||||
debug:
|
||||
msg: |
|
||||
✅ warden 服务器已成功配置为 Nomad 客户端
|
||||
📦 Nomad 版本: {{ nomad_version_output.stdout.split('\n')[0] }}
|
||||
🌐 服务器 IP: {{ server_ip }}
|
||||
🏗️ 数据中心: {{ nomad_datacenter }}
|
||||
📊 客户端状态: {{ 'SUCCESS' if nomad_node_status.rc == 0 else 'PENDING' }}
|
||||
🚀 warden 现在是 Nomad 集群的一部分
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
---
|
||||
- name: Thorough cleanup of Nomad configuration backup files
|
||||
hosts: nomad_nodes
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Remove all backup files with various patterns
|
||||
shell: |
|
||||
find /etc/nomad.d/ -name "nomad.hcl.*" -not -name "nomad.hcl" -delete
|
||||
find /etc/nomad.d/ -name "*.bak" -delete
|
||||
find /etc/nomad.d/ -name "*.backup*" -delete
|
||||
find /etc/nomad.d/ -name "*.~" -delete
|
||||
find /etc/nomad.d/ -name "*.broken" -delete
|
||||
ignore_errors: yes
|
||||
|
||||
- name: List remaining files in /etc/nomad.d/
|
||||
command: ls -la /etc/nomad.d/
|
||||
register: remaining_files
|
||||
changed_when: false
|
||||
|
||||
- name: Display remaining files
|
||||
debug:
|
||||
var: remaining_files.stdout_lines
|
||||
|
|
@ -0,0 +1,25 @@
|
|||
---
|
||||
- name: Cleanup Nomad configuration backup files
|
||||
hosts: nomad_nodes
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Remove backup files from /etc/nomad.d/
|
||||
file:
|
||||
path: "{{ item }}"
|
||||
state: absent
|
||||
loop:
|
||||
- "/etc/nomad.d/*.bak"
|
||||
- "/etc/nomad.d/*.backup"
|
||||
- "/etc/nomad.d/*.~"
|
||||
- "/etc/nomad.d/*.broken"
|
||||
- "/etc/nomad.d/nomad.hcl.*"
|
||||
ignore_errors: yes
|
||||
|
||||
- name: List remaining files in /etc/nomad.d/
|
||||
command: ls -la /etc/nomad.d/
|
||||
register: remaining_files
|
||||
changed_when: false
|
||||
|
||||
- name: Display remaining files
|
||||
debug:
|
||||
var: remaining_files.stdout_lines
|
||||
|
|
@ -0,0 +1,39 @@
|
|||
---
|
||||
- name: 配置Nomad客户端节点
|
||||
hosts: nomad_clients
|
||||
become: yes
|
||||
vars:
|
||||
nomad_config_dir: /etc/nomad.d
|
||||
|
||||
tasks:
|
||||
- name: 创建Nomad配置目录
|
||||
file:
|
||||
path: "{{ nomad_config_dir }}"
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 复制Nomad客户端配置模板
|
||||
template:
|
||||
src: ../templates/nomad-client.hcl
|
||||
dest: "{{ nomad_config_dir }}/nomad.hcl"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
|
||||
- name: 启动Nomad服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
enabled: yes
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 检查Nomad服务状态
|
||||
command: systemctl status nomad
|
||||
register: nomad_status
|
||||
changed_when: false
|
||||
|
||||
- name: 显示Nomad服务状态
|
||||
debug:
|
||||
var: nomad_status.stdout_lines
|
||||
|
|
@ -0,0 +1,44 @@
|
|||
---
|
||||
- name: 统一配置所有Nomad节点
|
||||
hosts: nomad_nodes
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: 备份当前Nomad配置
|
||||
copy:
|
||||
src: /etc/nomad.d/nomad.hcl
|
||||
dest: /etc/nomad.d/nomad.hcl.bak
|
||||
remote_src: yes
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 生成统一Nomad配置
|
||||
template:
|
||||
src: ../templates/nomad-unified.hcl.j2
|
||||
dest: /etc/nomad.d/nomad.hcl
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
|
||||
- name: 重启Nomad服务
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
enabled: yes
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 等待Nomad服务就绪
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ inventory_hostname }}.tailnet-68f9.ts.net"
|
||||
delay: 10
|
||||
timeout: 60
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 检查Nomad服务状态
|
||||
command: systemctl status nomad
|
||||
register: nomad_status
|
||||
changed_when: false
|
||||
|
||||
- name: 显示Nomad服务状态
|
||||
debug:
|
||||
var: nomad_status.stdout_lines
|
||||
|
|
@ -0,0 +1,62 @@
|
|||
---
|
||||
- name: Configure Nomad Dynamic Host Volumes for NFS
|
||||
hosts: nomad_clients
|
||||
become: yes
|
||||
vars:
|
||||
nfs_server: "snail"
|
||||
nfs_share: "/fs/1000/nfs/Fnsync"
|
||||
mount_point: "/mnt/fnsync"
|
||||
|
||||
tasks:
|
||||
- name: Stop Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: stopped
|
||||
|
||||
- name: Update Nomad configuration for dynamic host volumes
|
||||
blockinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
marker: "# {mark} DYNAMIC HOST VOLUMES CONFIGURATION"
|
||||
block: |
|
||||
client {
|
||||
# 启用动态host volumes
|
||||
host_volume "fnsync" {
|
||||
path = "{{ mount_point }}"
|
||||
read_only = false
|
||||
}
|
||||
|
||||
# 添加NFS相关的节点元数据
|
||||
meta {
|
||||
nfs_server = "{{ nfs_server }}"
|
||||
nfs_share = "{{ nfs_share }}"
|
||||
nfs_mounted = "true"
|
||||
}
|
||||
}
|
||||
insertafter: 'client {'
|
||||
|
||||
- name: Start Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: started
|
||||
enabled: yes
|
||||
|
||||
- name: Wait for Nomad to start
|
||||
wait_for:
|
||||
port: 4646
|
||||
delay: 10
|
||||
timeout: 60
|
||||
|
||||
- name: Check Nomad status
|
||||
command: nomad node status
|
||||
register: nomad_status
|
||||
ignore_errors: yes
|
||||
|
||||
- name: Display Nomad status
|
||||
debug:
|
||||
var: nomad_status.stdout_lines
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
|
@ -0,0 +1,57 @@
|
|||
---
|
||||
- name: Configure Podman driver for all Nomad client nodes
|
||||
hosts: target_nodes
|
||||
become: yes
|
||||
|
||||
tasks:
|
||||
- name: Stop Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: stopped
|
||||
|
||||
- name: Install Podman if not present
|
||||
package:
|
||||
name: podman
|
||||
state: present
|
||||
ignore_errors: yes
|
||||
|
||||
- name: Enable Podman socket
|
||||
systemd:
|
||||
name: podman.socket
|
||||
enabled: yes
|
||||
state: started
|
||||
ignore_errors: yes
|
||||
|
||||
- name: Update Nomad configuration to use Podman
|
||||
lineinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
regexp: '^plugin "docker"'
|
||||
line: 'plugin "podman" {'
|
||||
state: present
|
||||
|
||||
- name: Add Podman plugin configuration
|
||||
blockinfile:
|
||||
path: /etc/nomad.d/nomad.hcl
|
||||
marker: "# {mark} PODMAN PLUGIN CONFIG"
|
||||
block: |
|
||||
plugin "podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
insertafter: 'client {'
|
||||
|
||||
- name: Start Nomad service
|
||||
systemd:
|
||||
name: nomad
|
||||
state: started
|
||||
|
||||
- name: Wait for Nomad to be ready
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: localhost
|
||||
delay: 5
|
||||
timeout: 30
|
||||
|
|
@ -0,0 +1,22 @@
|
|||
---
|
||||
- name: Configure NOPASSWD sudo for nomad user
|
||||
hosts: nomad_clients
|
||||
become: yes
|
||||
tasks:
|
||||
- name: Ensure sudoers.d directory exists
|
||||
file:
|
||||
path: /etc/sudoers.d
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0750'
|
||||
|
||||
- name: Allow nomad user passwordless sudo for required commands
|
||||
copy:
|
||||
dest: /etc/sudoers.d/nomad
|
||||
content: |
|
||||
nomad ALL=(ALL) NOPASSWD: /usr/bin/apt, /usr/bin/systemctl, /bin/mkdir, /bin/chown, /bin/chmod, /bin/mv, /bin/sed, /usr/bin/tee, /usr/sbin/usermod, /usr/bin/unzip, /usr/bin/wget
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0440'
|
||||
validate: 'visudo -cf %s'
|
||||
|
|
@ -0,0 +1,226 @@
|
|||
---
|
||||
- name: 配置 Nomad 集群使用 Tailscale 网络通讯
|
||||
hosts: nomad_cluster
|
||||
become: yes
|
||||
gather_facts: no
|
||||
vars:
|
||||
nomad_config_dir: "/etc/nomad.d"
|
||||
nomad_config_file: "{{ nomad_config_dir }}/nomad.hcl"
|
||||
|
||||
tasks:
|
||||
- name: 获取当前节点的 Tailscale IP
|
||||
shell: tailscale ip | head -1
|
||||
register: current_tailscale_ip
|
||||
changed_when: false
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 计算用于 Nomad 的地址(优先 Tailscale,回退到 inventory 或 ansible_host)
|
||||
set_fact:
|
||||
node_addr: "{{ (current_tailscale_ip.stdout | default('')) is match('^100\\.') | ternary((current_tailscale_ip.stdout | trim), (hostvars[inventory_hostname].tailscale_ip | default(ansible_host))) }}"
|
||||
|
||||
- name: 确保 Nomad 配置目录存在
|
||||
file:
|
||||
path: "{{ nomad_config_dir }}"
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 生成 Nomad 服务器配置(使用 Tailscale)
|
||||
copy:
|
||||
dest: "{{ nomad_config_file }}"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
content: |
|
||||
datacenter = "{{ nomad_datacenter | default('dc1') }}"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
|
||||
bind_addr = "{{ node_addr }}"
|
||||
|
||||
addresses {
|
||||
http = "{{ node_addr }}"
|
||||
rpc = "{{ node_addr }}"
|
||||
serf = "{{ node_addr }}"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = {{ nomad_bootstrap_expect | default(4) }}
|
||||
|
||||
retry_join = [
|
||||
"100.116.158.95", # semaphore
|
||||
"100.103.147.94", # ash2e
|
||||
"100.81.26.3", # ash1d
|
||||
"100.90.159.68" # ch2
|
||||
]
|
||||
|
||||
encrypt = "{{ nomad_encrypt_key }}"
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
plugin "podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "{{ node_addr }}:8500"
|
||||
}
|
||||
when: nomad_role == "server"
|
||||
notify: restart nomad
|
||||
|
||||
- name: 生成 Nomad 客户端配置(使用 Tailscale)
|
||||
copy:
|
||||
dest: "{{ nomad_config_file }}"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
content: |
|
||||
datacenter = "{{ nomad_datacenter | default('dc1') }}"
|
||||
data_dir = "/opt/nomad/data"
|
||||
log_level = "INFO"
|
||||
|
||||
bind_addr = "{{ node_addr }}"
|
||||
|
||||
addresses {
|
||||
http = "{{ node_addr }}"
|
||||
rpc = "{{ node_addr }}"
|
||||
serf = "{{ node_addr }}"
|
||||
}
|
||||
|
||||
ports {
|
||||
http = 4646
|
||||
rpc = 4647
|
||||
serf = 4648
|
||||
}
|
||||
|
||||
server {
|
||||
enabled = false
|
||||
}
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
network_interface = "tailscale0"
|
||||
cpu_total_compute = 0
|
||||
|
||||
servers = [
|
||||
"100.116.158.95:4647", # semaphore
|
||||
"100.103.147.94:4647", # ash2e
|
||||
"100.81.26.3:4647", # ash1d
|
||||
"100.90.159.68:4647" # ch2
|
||||
]
|
||||
}
|
||||
|
||||
plugin "podman" {
|
||||
config {
|
||||
socket_path = "unix:///run/podman/podman.sock"
|
||||
volumes {
|
||||
enabled = true
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
consul {
|
||||
address = "{{ node_addr }}:8500"
|
||||
}
|
||||
when: nomad_role == "client"
|
||||
notify: restart nomad
|
||||
|
||||
- name: 检查 Nomad 二进制文件位置
|
||||
shell: which nomad || find /usr -name nomad 2>/dev/null | head -1
|
||||
register: nomad_binary_path
|
||||
failed_when: nomad_binary_path.stdout == ""
|
||||
|
||||
- name: 创建/更新 Nomad systemd 服务文件
|
||||
copy:
|
||||
dest: "/etc/systemd/system/nomad.service"
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0644'
|
||||
content: |
|
||||
[Unit]
|
||||
Description=Nomad
|
||||
Documentation=https://www.nomadproject.io/
|
||||
Requires=network-online.target
|
||||
After=network-online.target
|
||||
|
||||
[Service]
|
||||
Type=notify
|
||||
User=root
|
||||
Group=root
|
||||
ExecStart={{ nomad_binary_path.stdout }} agent -config=/etc/nomad.d/nomad.hcl
|
||||
ExecReload=/bin/kill -HUP $MAINPID
|
||||
KillMode=process
|
||||
Restart=on-failure
|
||||
LimitNOFILE=65536
|
||||
|
||||
[Install]
|
||||
WantedBy=multi-user.target
|
||||
notify: restart nomad
|
||||
|
||||
- name: 确保 Nomad 数据目录存在
|
||||
file:
|
||||
path: "/opt/nomad/data"
|
||||
state: directory
|
||||
owner: root
|
||||
group: root
|
||||
mode: '0755'
|
||||
|
||||
- name: 重新加载 systemd daemon
|
||||
systemd:
|
||||
daemon_reload: yes
|
||||
|
||||
- name: 启用并启动 Nomad 服务
|
||||
systemd:
|
||||
name: nomad
|
||||
enabled: yes
|
||||
state: started
|
||||
|
||||
- name: 等待 Nomad 服务启动
|
||||
wait_for:
|
||||
port: 4646
|
||||
host: "{{ node_addr }}"
|
||||
delay: 5
|
||||
timeout: 30
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 检查 Nomad 服务状态
|
||||
shell: systemctl status nomad --no-pager -l
|
||||
register: nomad_status
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 显示配置结果
|
||||
debug:
|
||||
msg: |
|
||||
✅ 节点 {{ inventory_hostname }} 配置完成
|
||||
🌐 使用地址: {{ node_addr }}
|
||||
🎯 角色: {{ nomad_role }}
|
||||
🔧 Nomad 二进制: {{ nomad_binary_path.stdout }}
|
||||
📊 服务状态: {{ 'active' if nomad_status.rc == 0 else 'failed' }}
|
||||
{% if nomad_status.rc != 0 %}
|
||||
❌ 错误信息:
|
||||
{{ nomad_status.stdout }}
|
||||
{{ nomad_status.stderr }}
|
||||
{% endif %}
|
||||
|
||||
handlers:
|
||||
- name: restart nomad
|
||||
systemd:
|
||||
name: nomad
|
||||
state: restarted
|
||||
daemon_reload: yes
|
||||
|
|
@ -0,0 +1,115 @@
|
|||
---
|
||||
- name: Configure Podman for Nomad Integration
|
||||
hosts: all
|
||||
become: yes
|
||||
gather_facts: yes
|
||||
|
||||
tasks:
|
||||
- name: 显示当前处理的节点
|
||||
debug:
|
||||
msg: "🔧 正在为 Nomad 配置 Podman: {{ inventory_hostname }}"
|
||||
|
||||
- name: 确保 Podman 已安装
|
||||
package:
|
||||
name: podman
|
||||
state: present
|
||||
|
||||
- name: 启用并启动 Podman socket 服务
|
||||
systemd:
|
||||
name: podman.socket
|
||||
enabled: yes
|
||||
state: started
|
||||
|
||||
- name: 创建 Podman 系统配置目录
|
||||
file:
|
||||
path: /etc/containers
|
||||
state: directory
|
||||
mode: '0755'
|
||||
|
||||
- name: 配置 Podman 使用系统 socket
|
||||
copy:
|
||||
content: |
|
||||
[engine]
|
||||
# 使用系统级 socket 而不是用户级 socket
|
||||
active_service = "system"
|
||||
[engine.service_destinations]
|
||||
[engine.service_destinations.system]
|
||||
uri = "unix:///run/podman/podman.sock"
|
||||
dest: /etc/containers/containers.conf
|
||||
mode: '0644'
|
||||
|
||||
- name: 检查是否存在 nomad 用户
|
||||
getent:
|
||||
database: passwd
|
||||
key: nomad
|
||||
register: nomad_user_check
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 为 nomad 用户创建配置目录
|
||||
file:
|
||||
path: "/home/nomad/.config/containers"
|
||||
state: directory
|
||||
owner: nomad
|
||||
group: nomad
|
||||
mode: '0755'
|
||||
when: nomad_user_check is succeeded
|
||||
|
||||
- name: 为 nomad 用户配置 Podman
|
||||
copy:
|
||||
content: |
|
||||
[engine]
|
||||
active_service = "system"
|
||||
[engine.service_destinations]
|
||||
[engine.service_destinations.system]
|
||||
uri = "unix:///run/podman/podman.sock"
|
||||
dest: /home/nomad/.config/containers/containers.conf
|
||||
owner: nomad
|
||||
group: nomad
|
||||
mode: '0644'
|
||||
when: nomad_user_check is succeeded
|
||||
|
||||
- name: 将 nomad 用户添加到 podman 组
|
||||
user:
|
||||
name: nomad
|
||||
groups: podman
|
||||
append: yes
|
||||
when: nomad_user_check is succeeded
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 创建 podman 组(如果不存在)
|
||||
group:
|
||||
name: podman
|
||||
state: present
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 设置 podman socket 目录权限
|
||||
file:
|
||||
path: /run/podman
|
||||
state: directory
|
||||
mode: '0755'
|
||||
group: podman
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 验证 Podman socket 权限
|
||||
file:
|
||||
path: /run/podman/podman.sock
|
||||
mode: '066'
|
||||
when: nomad_user_check is succeeded
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 验证 Podman 安装
|
||||
shell: podman --version
|
||||
register: podman_version
|
||||
|
||||
- name: 测试 Podman 功能
|
||||
shell: podman info
|
||||
register: podman_info
|
||||
ignore_errors: yes
|
||||
|
||||
- name: 显示配置结果
|
||||
debug:
|
||||
msg: |
|
||||
✅ 节点 {{ inventory_hostname }} Podman 配置完成
|
||||
📦 Podman 版本: {{ podman_version.stdout }}
|
||||
🐳 Podman 状态: {{ 'SUCCESS' if podman_info.rc == 0 else 'WARNING' }}
|
||||
👤 Nomad 用户: {{ 'FOUND' if nomad_user_check is succeeded else 'NOT FOUND' }}
|
||||
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue