Files
k8s-game-2048/docs/WEBHOOK_DEPLOYMENT.md
Greg 63b53dfc1b feat: Implement webhook-based deployment for k3s behind NAT
- Replace SSH/kubectl deployment with secure webhook-based approach
- Add comprehensive webhook handler with HMAC signature verification
- Support blue-green deployment strategy for production
- Implement auto-promotion pipeline: dev → staging → prod
- Add health checks using canonical Knative domains only
- Include complete deployment documentation and setup scripts

Changes:
- Updated deploy-dev.yml, deploy-staging.yml, deploy-prod.yml workflows
- Added webhook handler Python script with Flask API
- Created Kubernetes manifests for webhook system deployment
- Added ingress and service configuration for external access
- Created setup script for automated webhook system installation
- Documented complete webhook-based deployment guide

Perfect for k3s clusters behind NAT without direct API access.
2025-06-30 23:41:53 -07:00

7.7 KiB

Webhook-Based Deployment Guide

This guide explains how to set up the webhook-based deployment system for the k8s-game-2048 application, designed to work with k3s clusters behind NAT (no direct API access).

Overview

The deployment pipeline uses secure webhooks instead of direct kubectl/SSH access, making it perfect for k3s clusters behind NAT or firewall restrictions. Each environment (dev, staging, prod) has its own webhook endpoint that receives deployment instructions and applies them locally.

Architecture

GitHub Actions → HTTPS Webhook → Local Webhook Handler → kubectl apply

Deployment Flow

  1. Development: Triggered on push to main/master
  2. Staging: Auto-promoted from successful dev deployment
  3. Production: Auto-promoted from successful staging OR manual deployment with confirmation

Required Secrets

Configure these secrets in your GitHub repository settings:

GitHub Container Registry

  • GITHUB_TOKEN - Automatically provided by GitHub Actions

Webhook Endpoints

  • DEV_WEBHOOK_URL - Your development webhook endpoint
  • STAGING_WEBHOOK_URL - Your staging webhook endpoint
  • PROD_WEBHOOK_URL - Your production webhook endpoint

Security

  • WEBHOOK_SECRET - Shared secret for HMAC signature verification
  • KNATIVE_DOMAIN - Your Knative cluster domain (e.g., staging.wa.darknex.us)

Webhook Handler Implementation

You need to implement webhook handlers on your k3s cluster that:

  1. Receive webhook POST requests with deployment details
  2. Verify HMAC signatures for security
  3. Pull the specified Docker image
  4. Apply Kubernetes manifests
  5. Return deployment status

Example Webhook Payload

{
  "environment": "development",
  "image": "ghcr.io/owner/repo:tag",
  "namespace": "game-2048-dev",
  "service_name": "game-2048-dev",
  "deployment_id": "123456-1",
  "commit_sha": "abc123...",
  "triggered_by": "username",
  "timestamp": "2024-01-01T12:00:00Z",
  "auto_promotion": false,
  "deployment_strategy": "rolling" // or "blue-green" for prod
}

Security Headers

The webhook includes these security headers:

  • X-Signature-SHA256: HMAC-SHA256 signature of the payload
  • X-GitHub-Event: Always "deployment"
  • X-GitHub-Delivery: Unique delivery ID

Sample Webhook Handler (Python Flask)

import hashlib
import hmac
import json
import subprocess
from flask import Flask, request, jsonify

app = Flask(__name__)
WEBHOOK_SECRET = "your-webhook-secret"

def verify_signature(payload, signature):
    expected = hmac.new(
        WEBHOOK_SECRET.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(f"sha256={expected}", signature)

@app.route('/webhook/deploy', methods=['POST'])
def deploy():
    # Verify signature
    signature = request.headers.get('X-Signature-SHA256')
    if not verify_signature(request.data, signature):
        return jsonify({"error": "Invalid signature"}), 401
    
    data = request.json
    image = data['image']
    namespace = data['namespace']
    
    try:
        # Pull image
        subprocess.run(['docker', 'pull', image], check=True)
        
        # Apply manifests
        subprocess.run([
            'kubectl', 'apply', '-f', f'manifests/{data["environment"]}/'
        ], check=True)
        
        # Update image
        subprocess.run([
            'kubectl', 'patch', 'ksvc', data['service_name'],
            '-n', namespace,
            '--type', 'merge',
            '-p', f'{{"spec":{{"template":{{"spec":{{"containers":[{{"image":"{image}","imagePullPolicy":"Always"}}]}}}}}}}}'
        ], check=True)
        
        return jsonify({"status": "success", "deployment_id": data['deployment_id']})
        
    except subprocess.CalledProcessError as e:
        return jsonify({"error": str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8080)

Deployment Strategies

Development & Staging

  • Strategy: Rolling update
  • Traffic: Immediate 100% switch
  • Verification: Health check after 30-45 seconds

Production

  • Strategy: Blue-Green deployment
  • Traffic Split: 10% → 50% → 100% over 5 minutes
  • Verification: Extended health checks and response time validation

Health Checks

All environments use canonical Knative domains for health checks:

  • Dev: https://game-2048-dev.game-2048-dev.{KNATIVE_DOMAIN}
  • Staging: https://game-2048-staging.game-2048-staging.{KNATIVE_DOMAIN}
  • Prod: https://game-2048-prod.game-2048-prod.{KNATIVE_DOMAIN}

Auto-Promotion Pipeline

Push to main → Dev Deployment → Staging Deployment → Production (manual/auto)

Triggers

  • Dev: Automatic on code changes
  • Staging: Automatic on successful dev deployment
  • Prod: Automatic on successful staging deployment OR manual with confirmation

Manual Deployment

Staging

# Trigger staging deployment manually
gh workflow run deploy-staging.yml -f image_tag=v1.2.3

Production

# Trigger production deployment (requires confirmation)
gh workflow run deploy-prod.yml -f image_tag=v1.2.3 -f confirmation=DEPLOY

Monitoring & Debugging

GitHub Actions Logs

  • View deployment progress in Actions tab
  • Check webhook response codes and payloads
  • Monitor health check results

Cluster-Side Debugging

# Check webhook handler logs
kubectl logs -n webhook-system deployment/webhook-handler

# Check service status
kubectl get ksvc -n game-2048-dev

# Check recent deployments
kubectl get revisions -n game-2048-dev

Security Considerations

  1. HMAC Verification: All webhooks are signed with SHA-256 HMAC
  2. HTTPS Only: All webhook endpoints must use HTTPS
  3. Secret Rotation: Regularly rotate the WEBHOOK_SECRET
  4. Network Security: Consider IP allowlisting for webhook endpoints
  5. Audit Logging: Log all deployment requests with timestamps and users

Troubleshooting

Common Issues

Webhook Timeout

  • Symptom: HTTP 408 or connection timeout
  • Solution: Check webhook handler is running and accessible
  • Debug: Test webhook endpoint manually with curl

Signature Verification Failed

  • Symptom: HTTP 401 from webhook
  • Solution: Verify WEBHOOK_SECRET matches on both sides
  • Debug: Check HMAC calculation in webhook handler

Image Pull Errors

  • Symptom: Deployment fails after webhook success
  • Solution: Ensure image exists and registry credentials are configured
  • Debug: Check kubectl get events in the target namespace

Health Check Failures

  • Symptom: Deployment marked as failed despite successful webhook
  • Solution: Verify Knative domain configuration and service startup time
  • Debug: Check service logs and Knative serving controller logs

Manual Recovery

If automated deployment fails, you can deploy manually:

# Set image and apply manifests
kubectl patch ksvc game-2048-dev -n game-2048-dev \
  --type merge \
  -p '{"spec":{"template":{"spec":{"containers":[{"image":"ghcr.io/owner/repo:tag","imagePullPolicy":"Always"}]}}}}'

Benefits of Webhook-Based Deployment

  1. NAT-Friendly: Works with k3s clusters behind NAT/firewall
  2. Secure: HMAC-signed webhooks prevent unauthorized deployments
  3. Scalable: Can handle multiple clusters and environments
  4. Auditable: Full deployment history in GitHub Actions
  5. Flexible: Supports various deployment strategies
  6. Reliable: Retry logic and health checks ensure successful deployments

Next Steps

  1. Implement webhook handlers for each environment
  2. Configure webhook endpoints and secrets
  3. Test the deployment pipeline end-to-end
  4. Set up monitoring and alerting for webhook handlers
  5. Document environment-specific configuration