Files
k8s-game-2048/docs/WORKFLOWS.md
Greg 82fc2a6691 feat: Complete PII cleanup and fully automatic pipeline
🧹 PII Cleanup & Security:
- Remove all hardcoded domains (darknex.us, hndrx.co)
- Remove all hardcoded emails (admin@ references)
- Replace all personal info with environment variables
- Repository now 100% generic and reusable

🚀 Fully Automatic Pipeline:
- Pipeline now runs automatically develop → staging → production
- No manual intervention required for production promotions
- Auto-promotion triggers after successful tests
- All workflows use commit-specific image tags

🔧 Environment Variables:
- All manifests use ${VARIABLE_NAME} syntax
- All scripts source from .env file
- GitHub Actions use secrets for sensitive data
- Complete .env.example template provided

📚 Documentation:
- New comprehensive WORKFLOWS.md with pipeline details
- New PIPELINE_QUICK_REFERENCE.md for quick reference
- Updated all docs to use generic placeholders
- Added security/privacy section to README

🔐 Security Enhancements:
- Updated .gitignore for all sensitive files
- Created PII verification script (verify-pii-removal.sh)
- Created cleanup automation script (cleanup-pii.sh)
- Repository verified PII-free and production-ready

BREAKING: Repository now requires .env configuration
- Copy .env.example to .env and configure for your environment
- Set GitHub repository secrets for CI/CD workflows
- All deployments now use environment-specific configuration
2025-07-01 17:30:26 -07:00

10 KiB

🔄 CI/CD Pipeline Documentation

This document describes the complete automated deployment pipeline for the Knative 2048 Game on k3s.

📋 Table of Contents

🎯 Pipeline Overview

Complete Automatic Flow

graph TD
    A[Push to develop] --> B[Build & Push Image]
    B --> C[Deploy to Development]
    C --> D[Smoke Tests Dev]
    D --> E[Auto-Promote to Staging]
    E --> F[Build & Push Staging Image]
    F --> G[Deploy to Staging]
    G --> H[Smoke Tests Staging]
    H --> I[Auto-Promote to Production]
    I --> J[Push to main]
    J --> K[Build & Push Prod Image]
    K --> L[Deploy to Production]
    L --> M[Smoke Tests Production]
    
    N[Manual Deploy Prod] -.-> L
    O[Manual Promote Prod] -.-> I
    P[Manual Smoke Tests] -.-> D
    P -.-> H
    P -.-> M

Key Principles

  • Fully Automatic: Zero manual intervention from develop to production
  • No Race Conditions: Each step waits for the previous to complete
  • Test After Deploy: Smoke tests run on newly deployed versions
  • Commit-Specific Images: Each environment uses exact commit-tagged images
  • Automatic Promotion: Successful tests trigger automatic promotion
  • Manual Override: Emergency manual deployment still available

🔧 Workflow Details

1. Build and Push Container Image (build-image.yml)

Triggers:

  • Push to main, develop, staging
  • Pull requests to these branches

What it does:

  • Builds Docker image from current commit
  • Creates commit-specific tags: {branch}-{commit-hash}
  • Pushes to GitHub Container Registry (GHCR)
  • Provides foundation for all deployments

Tags created:

  • develop-abc1234 (for develop branch)
  • staging-def5678 (for staging branch)
  • main-ghi9012 (for main branch)

2. Deploy to Development (deploy-dev.yml)

Triggers:

  • After "Build and Push Container Image" completes successfully on develop
  • Manual dispatch

What it does:

  • Waits for build to complete (no race conditions)
  • Uses exact commit-tagged image that was just built
  • Deploys via webhook to k3s development namespace
  • Sets up development environment

Dependencies:

  • Requires successful build completion
  • Uses environment secrets: DEV_WEBHOOK_URL, WEBHOOK_SECRET

3. Smoke Tests (smoke-test.yml)

Triggers:

  • After any deployment completes ("Deploy to Development", "Deploy to Staging", "Deploy to Production")
  • Scheduled every 6 hours
  • Manual dispatch

What it does:

  • Tests the newly deployed version (not previous)
  • Validates canonical Knative domains
  • Checks content, performance, SSL certificates
  • Runs environment-specific tests

Environments tested:

  • 🧪 Development: Your configured development domain
  • 🎭 Staging: Your configured staging domain
  • 🚀 Production: Your configured production domain

4. Auto-Promote Pipeline (auto-promote.yml)

Triggers:

  • After "Smoke Tests" complete successfully on develop branch

What it does:

  • Verifies development smoke tests passed
  • Merges developstaging automatically
  • Triggers staging deployment pipeline
  • Creates promotion summary

Safety features:

  • Only runs if smoke tests pass
  • Handles "already up to date" scenarios gracefully

5. Deploy to Staging (deploy-staging.yml)

Triggers:

  • Push to staging branch (triggered by auto-promotion)
  • After "Auto-Promote Pipeline" completes
  • Manual dispatch

What it does:

  • Builds and deploys staging-specific image
  • Uses staging-{commit} tagged image
  • Deploys via webhook to k3s staging namespace

6. Auto-Promote to Production (promote-to-production.yml)

Triggers:

  • After "Smoke Tests" complete successfully on staging branch (AUTOMATIC)
  • Manual dispatch (emergency override only)

What it does:

  • Verifies staging smoke tests passed
  • Merges stagingmain automatically
  • Triggers production deployment immediately
  • Creates production promotion summary

Automation features:

  • Runs automatically after staging tests pass
  • No manual confirmation required
  • Seamless promotion from staging to production

7. Deploy to Production (deploy-prod.yml)

Triggers:

  • Push to main branch (triggered by auto-promotion) - AUTOMATIC
  • Manual dispatch (requires typing "DEPLOY" for emergencies)

What it does:

  • Automatically deploys when main branch is updated
  • Uses main-{commit} tagged image
  • Deploys via webhook to k3s production namespace
  • Blue-green deployment strategy for zero downtime

Automation features:

  • No manual confirmation required for automatic deployments
  • Immediate deployment after staging promotion
  • Manual override still available for emergencies

8. Deployment Status Check (deployment-status.yml)

Triggers:

  • Manual dispatch
  • Scheduled every 4 hours

What it does:

  • Checks health of all environments
  • Shows current versions deployed
  • Provides manual action options
  • Creates comprehensive status report

🎮 Manual Actions (Emergency Use Only)

Note

: The pipeline is fully automatic. Manual actions are only for emergency situations or debugging.

Emergency Actions

Action Workflow Required Input Use Case
Check Status Deployment Status Check None Monitor all environments
Test Environment Smoke Tests Environment (dev/staging/prod/all) Debug specific environment
Emergency Deploy Deploy to Production Type "DEPLOY" Emergency production fix
Force Promotion Auto-Promote to Production None Skip normal promotion flow

Emergency Procedures

Emergency Production Deployment

Use only if automatic pipeline is broken

  1. Go to Actions → "Deploy to Production"
  2. Click "Run workflow"
  3. Type "DEPLOY" in confirmation field
  4. Optionally specify image tag
  5. Click "Run workflow"

Force Production Promotion

Use only if auto-promotion fails

  1. Go to Actions → "Auto-Promote to Production"
  2. Click "Run workflow"
  3. Optionally skip tests if staging already validated
  4. Click "Run workflow"

3. Check Deployment Status

  1. Go to Actions → "Deployment Status Check"
  2. Click "Run workflow"
  3. View results in workflow summary

4. Run Smoke Tests

  1. Go to Actions → "Smoke Tests"
  2. Click "Run workflow"
  3. Select environment to test
  4. Click "Run workflow"

⚙️ Environment Configuration

Required Secrets

Secret Purpose Used By
GH_TOKEN GitHub Container Registry access Build workflows
WEBHOOK_SECRET Webhook signature validation All deployment workflows
DEV_WEBHOOK_URL Development deployment endpoint Deploy to Development
STAGING_WEBHOOK_URL Staging deployment endpoint Deploy to Staging
PROD_WEBHOOK_URL Production deployment endpoint Deploy to Production
DEV_DOMAIN Development domain suffix Smoke Tests
STAGING_DOMAIN Staging domain suffix Smoke Tests
PROD_DOMAIN Production domain suffix Smoke Tests

Environment URLs

Environment Canonical Domain
Development https://${DEV_CANONICAL_DOMAIN}
Staging https://${STAGING_CANONICAL_DOMAIN}
Production https://${PROD_CANONICAL_DOMAIN}

Image Tagging Strategy

Branch Tag Format Example Environment
develop develop-{commit} develop-abc1234 Development
staging staging-{commit} staging-def5678 Staging
main main-{commit} main-ghi9012 Production

🔍 Troubleshooting

Common Issues

Pipeline Not Triggering

Symptoms: New commit pushed but no workflows start Causes:

  • Workflow file syntax error
  • Missing required secrets
  • Branch protection rules blocking

Solutions:

  1. Check workflow syntax in .github/workflows/
  2. Verify all secrets are set in repository settings
  3. Check Actions tab for error messages

Deployment Fails

Symptoms: Deployment workflow fails Causes:

  • Webhook endpoint unreachable
  • Invalid webhook signature
  • k3s cluster issues
  • Image not found

Solutions:

  1. Check webhook handler logs: kubectl logs -n webhook-system deployment/webhook-handler
  2. Verify webhook secret matches between GitHub and cluster
  3. Confirm image exists in GHCR
  4. Check k3s cluster health

Smoke Tests Fail

Symptoms: Tests report environment unreachable Causes:

  • DNS resolution issues
  • SSL certificate problems
  • Application not responding
  • Ingress configuration issues

Solutions:

  1. Test domains manually: curl -I https://${DEV_CANONICAL_DOMAIN}
  2. Check Knative service status: kubectl get ksvc -A
  3. Verify ingress configuration: kubectl get ingress -A
  4. Check certificate status: kubectl get certificates -A

Auto-Promotion Not Working

Symptoms: Tests pass but promotion doesn't happen Causes:

  • Workflow permission issues
  • No new commits to merge
  • Dependency chain broken

Solutions:

  1. Check workflow permissions in repository settings
  2. Verify branch protection rules
  3. Check workflow run logs in Actions tab
  4. Manual promotion as fallback

Debug Commands

# Check all environments
kubectl get all -A | grep game-2048

# Check webhook handler
kubectl logs -n webhook-system deployment/webhook-handler --tail=50

# Check Knative services
kubectl get ksvc -A

# Check ingress
kubectl get ingress -A

# Test webhook endpoint
curl -X POST -H "Content-Type: application/json" \
  -d '{"test": "true"}' \
  https://your-webhook-url/webhook

# Check DNS resolution
dig ${DEV_CANONICAL_DOMAIN}

# Test SSL certificate
openssl s_client -servername ${DEV_CANONICAL_DOMAIN} \
  -connect ${DEV_CANONICAL_DOMAIN}:443

Emergency Procedures

Rollback Production

  1. Identify last known good commit/tag
  2. Run "Deploy to Production" manually
  3. Specify the good image tag
  4. Type "DEPLOY" to confirm

Skip Failed Tests

  1. Run "Promote to Production" manually
  2. Type "PROMOTE" to confirm
  3. Enable "Skip tests" if staging already validated

Force Promotion

  1. Manually merge branches using git
  2. Push to trigger deployments
  3. Monitor via "Deployment Status Check"


Last updated: 2025-01-01 16:00:00 UTC