#!/bin/bash # Nomad 多数据中心节点自动配置脚本 # 数据中心: ${datacenter} set -e # 日志函数 log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a /var/log/nomad-setup.log } log "开始配置 Nomad 节点 - 数据中心: ${datacenter}" # 更新系统 log "更新系统包..." apt-get update -y apt-get upgrade -y # 安装必要的包 log "安装必要的包..." apt-get install -y \ curl \ wget \ unzip \ jq \ docker.io \ docker-compose \ htop \ net-tools \ vim # 启动 Docker log "启动 Docker 服务..." systemctl enable docker systemctl start docker usermod -aG docker ubuntu # 安装 Nomad log "安装 Nomad ${nomad_version}..." cd /tmp wget -q https://releases.hashicorp.com/nomad/${nomad_version}/nomad_${nomad_version}_linux_amd64.zip unzip nomad_${nomad_version}_linux_amd64.zip mv nomad /usr/local/bin/ chmod +x /usr/local/bin/nomad # 创建 Nomad 用户和目录 log "创建 Nomad 用户和目录..." useradd --system --home /etc/nomad.d --shell /bin/false nomad mkdir -p /opt/nomad/data mkdir -p /etc/nomad.d mkdir -p /var/log/nomad chown -R nomad:nomad /opt/nomad /etc/nomad.d /var/log/nomad # 获取本机 IP 地址 if [ "${bind_addr}" = "auto" ]; then # 尝试多种方法获取 IP BIND_ADDR=$(curl -s http://169.254.169.254/latest/meta-data/local-ipv4 2>/dev/null || \ curl -s http://metadata.google.internal/computeMetadata/v1/instance/network-interfaces/0/ip -H "Metadata-Flavor: Google" 2>/dev/null || \ ip route get 8.8.8.8 | awk '{print $7; exit}' || \ hostname -I | awk '{print $1}') else BIND_ADDR="${bind_addr}" fi log "检测到 IP 地址: $BIND_ADDR" # 创建 Nomad 配置文件 log "创建 Nomad 配置文件..." cat > /etc/nomad.d/nomad.hcl << EOF datacenter = "${datacenter}" region = "global" data_dir = "/opt/nomad/data" bind_addr = "$BIND_ADDR" %{ if server_enabled } server { enabled = true bootstrap_expect = ${bootstrap_expect} encrypt = "${nomad_encrypt_key}" } %{ endif } %{ if client_enabled } client { enabled = true host_volume "docker-sock" { path = "/var/run/docker.sock" read_only = false } } %{ endif } ui { enabled = true } addresses { http = "0.0.0.0" rpc = "$BIND_ADDR" serf = "$BIND_ADDR" } ports { http = 4646 rpc = 4647 serf = 4648 } plugin "docker" { config { allow_privileged = true volumes { enabled = true } } } telemetry { collection_interval = "10s" disable_hostname = false prometheus_metrics = true publish_allocation_metrics = true publish_node_metrics = true } log_level = "INFO" log_file = "/var/log/nomad/nomad.log" EOF # 创建 systemd 服务文件 log "创建 systemd 服务文件..." cat > /etc/systemd/system/nomad.service << EOF [Unit] Description=Nomad Documentation=https://www.nomadproject.io/ Requires=network-online.target After=network-online.target ConditionFileNotEmpty=/etc/nomad.d/nomad.hcl [Service] Type=notify User=nomad Group=nomad ExecStart=/usr/local/bin/nomad agent -config=/etc/nomad.d/nomad.hcl ExecReload=/bin/kill -HUP \$MAINPID KillMode=process Restart=on-failure LimitNOFILE=65536 [Install] WantedBy=multi-user.target EOF # 启动 Nomad 服务 log "启动 Nomad 服务..." systemctl daemon-reload systemctl enable nomad systemctl start nomad # 等待服务启动 log "等待 Nomad 服务启动..." sleep 10 # 验证安装 log "验证 Nomad 安装..." if systemctl is-active --quiet nomad; then log "✅ Nomad 服务运行正常" log "📊 节点信息:" /usr/local/bin/nomad node status -self || true else log "❌ Nomad 服务启动失败" systemctl status nomad --no-pager || true journalctl -u nomad --no-pager -n 20 || true fi # 配置防火墙(如果需要) log "配置防火墙规则..." if command -v ufw >/dev/null 2>&1; then ufw allow 4646/tcp # HTTP API ufw allow 4647/tcp # RPC ufw allow 4648/tcp # Serf ufw allow 22/tcp # SSH fi # 创建有用的别名和脚本 log "创建管理脚本..." cat > /usr/local/bin/nomad-status << 'EOF' #!/bin/bash echo "=== Nomad 服务状态 ===" systemctl status nomad --no-pager echo -e "\n=== Nomad 集群成员 ===" nomad server members 2>/dev/null || echo "无法连接到集群" echo -e "\n=== Nomad 节点状态 ===" nomad node status 2>/dev/null || echo "无法获取节点状态" echo -e "\n=== 最近日志 ===" journalctl -u nomad --no-pager -n 5 EOF chmod +x /usr/local/bin/nomad-status # 添加到 ubuntu 用户的 bashrc echo 'alias ns="nomad-status"' >> /home/ubuntu/.bashrc echo 'alias nomad-logs="journalctl -u nomad -f"' >> /home/ubuntu/.bashrc log "🎉 Nomad 节点配置完成!" log "📍 数据中心: ${datacenter}" log "🌐 IP 地址: $BIND_ADDR" log "🔗 Web UI: http://$BIND_ADDR:4646" log "📝 使用 'nomad-status' 或 'ns' 命令查看状态" # 输出重要信息到 motd cat > /etc/update-motd.d/99-nomad << EOF #!/bin/bash echo "" echo "🚀 Nomad 节点信息:" echo " 数据中心: ${datacenter}" echo " IP 地址: $BIND_ADDR" echo " Web UI: http://$BIND_ADDR:4646" echo " 状态检查: nomad-status" echo "" EOF chmod +x /etc/update-motd.d/99-nomad log "节点配置脚本执行完成"