🧹 PII Cleanup & Security: - Remove all hardcoded domains (darknex.us, hndrx.co) - Remove all hardcoded emails (admin@ references) - Replace all personal info with environment variables - Repository now 100% generic and reusable 🚀 Fully Automatic Pipeline: - Pipeline now runs automatically develop → staging → production - No manual intervention required for production promotions - Auto-promotion triggers after successful tests - All workflows use commit-specific image tags 🔧 Environment Variables: - All manifests use ${VARIABLE_NAME} syntax - All scripts source from .env file - GitHub Actions use secrets for sensitive data - Complete .env.example template provided 📚 Documentation: - New comprehensive WORKFLOWS.md with pipeline details - New PIPELINE_QUICK_REFERENCE.md for quick reference - Updated all docs to use generic placeholders - Added security/privacy section to README 🔐 Security Enhancements: - Updated .gitignore for all sensitive files - Created PII verification script (verify-pii-removal.sh) - Created cleanup automation script (cleanup-pii.sh) - Repository verified PII-free and production-ready BREAKING: Repository now requires .env configuration - Copy .env.example to .env and configure for your environment - Set GitHub repository secrets for CI/CD workflows - All deployments now use environment-specific configuration
10 KiB
🔄 CI/CD Pipeline Documentation
This document describes the complete automated deployment pipeline for the Knative 2048 Game on k3s.
📋 Table of Contents
🎯 Pipeline Overview
Complete Automatic Flow
graph TD
A[Push to develop] --> B[Build & Push Image]
B --> C[Deploy to Development]
C --> D[Smoke Tests Dev]
D --> E[Auto-Promote to Staging]
E --> F[Build & Push Staging Image]
F --> G[Deploy to Staging]
G --> H[Smoke Tests Staging]
H --> I[Auto-Promote to Production]
I --> J[Push to main]
J --> K[Build & Push Prod Image]
K --> L[Deploy to Production]
L --> M[Smoke Tests Production]
N[Manual Deploy Prod] -.-> L
O[Manual Promote Prod] -.-> I
P[Manual Smoke Tests] -.-> D
P -.-> H
P -.-> M
Key Principles
- Fully Automatic: Zero manual intervention from develop to production
- No Race Conditions: Each step waits for the previous to complete
- Test After Deploy: Smoke tests run on newly deployed versions
- Commit-Specific Images: Each environment uses exact commit-tagged images
- Automatic Promotion: Successful tests trigger automatic promotion
- Manual Override: Emergency manual deployment still available
🔧 Workflow Details
1. Build and Push Container Image (build-image.yml)
Triggers:
- Push to
main,develop,staging - Pull requests to these branches
What it does:
- Builds Docker image from current commit
- Creates commit-specific tags:
{branch}-{commit-hash} - Pushes to GitHub Container Registry (GHCR)
- Provides foundation for all deployments
Tags created:
develop-abc1234(for develop branch)staging-def5678(for staging branch)main-ghi9012(for main branch)
2. Deploy to Development (deploy-dev.yml)
Triggers:
- After "Build and Push Container Image" completes successfully on
develop - Manual dispatch
What it does:
- Waits for build to complete (no race conditions)
- Uses exact commit-tagged image that was just built
- Deploys via webhook to k3s development namespace
- Sets up development environment
Dependencies:
- Requires successful build completion
- Uses environment secrets:
DEV_WEBHOOK_URL,WEBHOOK_SECRET
3. Smoke Tests (smoke-test.yml)
Triggers:
- After any deployment completes ("Deploy to Development", "Deploy to Staging", "Deploy to Production")
- Scheduled every 6 hours
- Manual dispatch
What it does:
- Tests the newly deployed version (not previous)
- Validates canonical Knative domains
- Checks content, performance, SSL certificates
- Runs environment-specific tests
Environments tested:
- 🧪 Development: Your configured development domain
- 🎭 Staging: Your configured staging domain
- 🚀 Production: Your configured production domain
4. Auto-Promote Pipeline (auto-promote.yml)
Triggers:
- After "Smoke Tests" complete successfully on
developbranch
What it does:
- Verifies development smoke tests passed
- Merges
develop→stagingautomatically - Triggers staging deployment pipeline
- Creates promotion summary
Safety features:
- Only runs if smoke tests pass
- Handles "already up to date" scenarios gracefully
5. Deploy to Staging (deploy-staging.yml)
Triggers:
- Push to
stagingbranch (triggered by auto-promotion) - After "Auto-Promote Pipeline" completes
- Manual dispatch
What it does:
- Builds and deploys staging-specific image
- Uses
staging-{commit}tagged image - Deploys via webhook to k3s staging namespace
6. Auto-Promote to Production (promote-to-production.yml)
Triggers:
- After "Smoke Tests" complete successfully on
stagingbranch (AUTOMATIC) - Manual dispatch (emergency override only)
What it does:
- Verifies staging smoke tests passed
- Merges
staging→mainautomatically - Triggers production deployment immediately
- Creates production promotion summary
Automation features:
- Runs automatically after staging tests pass
- No manual confirmation required
- Seamless promotion from staging to production
7. Deploy to Production (deploy-prod.yml)
Triggers:
- Push to
mainbranch (triggered by auto-promotion) - AUTOMATIC - Manual dispatch (requires typing "DEPLOY" for emergencies)
What it does:
- Automatically deploys when main branch is updated
- Uses
main-{commit}tagged image - Deploys via webhook to k3s production namespace
- Blue-green deployment strategy for zero downtime
Automation features:
- No manual confirmation required for automatic deployments
- Immediate deployment after staging promotion
- Manual override still available for emergencies
8. Deployment Status Check (deployment-status.yml)
Triggers:
- Manual dispatch
- Scheduled every 4 hours
What it does:
- Checks health of all environments
- Shows current versions deployed
- Provides manual action options
- Creates comprehensive status report
🎮 Manual Actions (Emergency Use Only)
Note
: The pipeline is fully automatic. Manual actions are only for emergency situations or debugging.
Emergency Actions
| Action | Workflow | Required Input | Use Case |
|---|---|---|---|
| Check Status | Deployment Status Check | None | Monitor all environments |
| Test Environment | Smoke Tests | Environment (dev/staging/prod/all) |
Debug specific environment |
| Emergency Deploy | Deploy to Production | Type "DEPLOY" | Emergency production fix |
| Force Promotion | Auto-Promote to Production | None | Skip normal promotion flow |
Emergency Procedures
Emergency Production Deployment
Use only if automatic pipeline is broken
- Go to Actions → "Deploy to Production"
- Click "Run workflow"
- Type "DEPLOY" in confirmation field
- Optionally specify image tag
- Click "Run workflow"
Force Production Promotion
Use only if auto-promotion fails
- Go to Actions → "Auto-Promote to Production"
- Click "Run workflow"
- Optionally skip tests if staging already validated
- Click "Run workflow"
3. Check Deployment Status
- Go to Actions → "Deployment Status Check"
- Click "Run workflow"
- View results in workflow summary
4. Run Smoke Tests
- Go to Actions → "Smoke Tests"
- Click "Run workflow"
- Select environment to test
- Click "Run workflow"
⚙️ Environment Configuration
Required Secrets
| Secret | Purpose | Used By |
|---|---|---|
GH_TOKEN |
GitHub Container Registry access | Build workflows |
WEBHOOK_SECRET |
Webhook signature validation | All deployment workflows |
DEV_WEBHOOK_URL |
Development deployment endpoint | Deploy to Development |
STAGING_WEBHOOK_URL |
Staging deployment endpoint | Deploy to Staging |
PROD_WEBHOOK_URL |
Production deployment endpoint | Deploy to Production |
DEV_DOMAIN |
Development domain suffix | Smoke Tests |
STAGING_DOMAIN |
Staging domain suffix | Smoke Tests |
PROD_DOMAIN |
Production domain suffix | Smoke Tests |
Environment URLs
| Environment | Canonical Domain |
|---|---|
| Development | https://${DEV_CANONICAL_DOMAIN} |
| Staging | https://${STAGING_CANONICAL_DOMAIN} |
| Production | https://${PROD_CANONICAL_DOMAIN} |
Image Tagging Strategy
| Branch | Tag Format | Example | Environment |
|---|---|---|---|
| develop | develop-{commit} |
develop-abc1234 |
Development |
| staging | staging-{commit} |
staging-def5678 |
Staging |
| main | main-{commit} |
main-ghi9012 |
Production |
🔍 Troubleshooting
Common Issues
Pipeline Not Triggering
Symptoms: New commit pushed but no workflows start Causes:
- Workflow file syntax error
- Missing required secrets
- Branch protection rules blocking
Solutions:
- Check workflow syntax in
.github/workflows/ - Verify all secrets are set in repository settings
- Check Actions tab for error messages
Deployment Fails
Symptoms: Deployment workflow fails Causes:
- Webhook endpoint unreachable
- Invalid webhook signature
- k3s cluster issues
- Image not found
Solutions:
- Check webhook handler logs:
kubectl logs -n webhook-system deployment/webhook-handler - Verify webhook secret matches between GitHub and cluster
- Confirm image exists in GHCR
- Check k3s cluster health
Smoke Tests Fail
Symptoms: Tests report environment unreachable Causes:
- DNS resolution issues
- SSL certificate problems
- Application not responding
- Ingress configuration issues
Solutions:
- Test domains manually:
curl -I https://${DEV_CANONICAL_DOMAIN} - Check Knative service status:
kubectl get ksvc -A - Verify ingress configuration:
kubectl get ingress -A - Check certificate status:
kubectl get certificates -A
Auto-Promotion Not Working
Symptoms: Tests pass but promotion doesn't happen Causes:
- Workflow permission issues
- No new commits to merge
- Dependency chain broken
Solutions:
- Check workflow permissions in repository settings
- Verify branch protection rules
- Check workflow run logs in Actions tab
- Manual promotion as fallback
Debug Commands
# Check all environments
kubectl get all -A | grep game-2048
# Check webhook handler
kubectl logs -n webhook-system deployment/webhook-handler --tail=50
# Check Knative services
kubectl get ksvc -A
# Check ingress
kubectl get ingress -A
# Test webhook endpoint
curl -X POST -H "Content-Type: application/json" \
-d '{"test": "true"}' \
https://your-webhook-url/webhook
# Check DNS resolution
dig ${DEV_CANONICAL_DOMAIN}
# Test SSL certificate
openssl s_client -servername ${DEV_CANONICAL_DOMAIN} \
-connect ${DEV_CANONICAL_DOMAIN}:443
Emergency Procedures
Rollback Production
- Identify last known good commit/tag
- Run "Deploy to Production" manually
- Specify the good image tag
- Type "DEPLOY" to confirm
Skip Failed Tests
- Run "Promote to Production" manually
- Type "PROMOTE" to confirm
- Enable "Skip tests" if staging already validated
Force Promotion
- Manually merge branches using git
- Push to trigger deployments
- Monitor via "Deployment Status Check"
📚 Related Documentation
Last updated: 2025-01-01 16:00:00 UTC