feat: Terraform Foundation - AWS Landing Zone

Enterprise-grade multi-tenant AWS cloud foundation.

Modules:
- GitHub OIDC for keyless CI/CD authentication
- IAM account settings and security baseline
- AWS Config Rules for compliance
- ABAC (Attribute-Based Access Control)
- SCPs (Service Control Policies)

Features:
- Multi-account architecture
- Cost optimization patterns
- Security best practices
- Comprehensive documentation

Tech: Terraform, AWS Organizations, IAM Identity Center
This commit is contained in:
2026-02-01 20:06:28 +00:00
commit 6136cde9bb
145 changed files with 30832 additions and 0 deletions

212
docs/COST-OPTIMIZATION.md Normal file
View File

@@ -0,0 +1,212 @@
# Cost Optimization Guide
This document outlines cost-saving strategies implemented in this foundation and recommendations for further optimization.
## Built-In Cost Savings
### 1. Shared VPC Architecture
**Savings: ~$256/month per 3 tenants**
| Approach | NAT Gateways | Monthly Cost |
|----------|--------------|--------------|
| VPC per tenant (3 tenants, 2 AZ) | 6 | ~$192 |
| **Shared VPC (single NAT)** | 1 | ~$32 |
The shared VPC with tenant isolation via security groups provides the same logical separation at a fraction of the cost.
### 2. Single NAT Gateway
For non-production or cost-sensitive workloads:
```hcl
# terraform/02-network/main.tf
variable "enable_nat" {
default = true # Set to false to save ~$32/mo (no private subnet egress)
}
```
**Alternative**: NAT Instance (~$3/mo for t4g.nano) for dev environments.
### 3. GP3 Storage (Default)
All EBS and RDS storage uses GP3:
- 20% cheaper than GP2
- 3,000 IOPS included (vs 100 IOPS/GB for GP2)
- Configurable IOPS and throughput
### 4. Fargate Spot (ECS)
```hcl
# Configured in ECS template
default_capacity_provider_strategy {
base = 1 # 1 On-Demand for availability
weight = 100
capacity_provider = "FARGATE" # Change to FARGATE_SPOT for 70% savings
}
```
**Savings**: Up to 70% on Fargate compute.
### 5. EKS Spot Instances
```hcl
# Uncomment in EKS template
node_groups = {
spot = {
instance_types = ["t3.medium", "t3.large", "t3a.medium"] # Diversify!
capacity_type = "SPOT"
# ...
}
}
```
**Savings**: Up to 90% on EC2 compute.
### 6. S3 Intelligent-Tiering
For logs bucket (already configured):
```hcl
lifecycle_configuration {
rule {
transition {
days = 90
storage_class = "GLACIER"
}
expiration {
days = 2555 # 7 years
}
}
}
```
### 7. CloudWatch Log Retention
All log groups configured with retention (default 30 days):
```hcl
retention_in_days = 30 # Adjust based on compliance needs
```
**Cost**: ~$0.03/GB/month for ingestion + storage.
## Recommendations
### Compute Right-Sizing
1. **Start Small**: Use `t3.micro` or `t3.small` for non-prod
2. **Monitor**: Use CloudWatch Container Insights / Compute Optimizer
3. **Scale Down**: Reduce replica counts in dev/staging
### Reserved Capacity
| Resource | Savings | Commitment |
|----------|---------|------------|
| EC2 Reserved | 30-72% | 1-3 years |
| RDS Reserved | 30-60% | 1-3 years |
| Savings Plans (Compute) | 20-66% | 1-3 years |
| ElastiCache Reserved | 30-55% | 1-3 years |
**Recommendation**: After 3 months of stable usage, purchase Compute Savings Plans.
### Database Optimization
1. **Aurora Serverless v2**: For variable workloads (scales to 0.5 ACU)
2. **RDS Proxy**: Pool connections, reduce instance size
3. **Read Replicas**: Only for read-heavy workloads
4. **Stop Dev Databases**: Use Lambda to stop/start on schedule
```hcl
# Example: Smaller dev database
locals {
instance_class = local.env == "prod" ? "db.r6g.large" : "db.t3.micro"
}
```
### Networking
1. **VPC Endpoints**: For S3, ECR, Secrets Manager (~$7/mo each, but saves NAT costs)
2. **PrivateLink**: For high-volume AWS service access
3. **CloudFront**: Cache static content, reduce origin load
### Monitoring Cost Control
```hcl
# Reduce metric granularity in non-prod
enhanced_monitoring_interval = local.env == "prod" ? 60 : 0
# Disable Performance Insights in dev
performance_insights = local.env != "dev"
```
### EKS Specific
1. **Karpenter**: Better bin-packing than Cluster Autoscaler
2. **Bottlerocket OS**: Smaller footprint, faster boot
3. **Fargate for Batch**: No idle nodes
## Cost Monitoring
### AWS Tools
1. **Cost Explorer**: Built-in, tag-based analysis
2. **Budgets**: Already configured per-tenant
3. **Cost Anomaly Detection**: ML-based alerts
### Third-Party
1. **Infracost**: PR-level cost estimation (in Makefile)
2. **Kubecost**: Kubernetes cost allocation
3. **Spot.io**: Spot instance management
## Environment-Based Defaults
```hcl
locals {
# Automatically scale down non-prod
instance_class = {
prod = "db.r6g.large"
staging = "db.t3.small"
dev = "db.t3.micro"
}[local.env]
desired_count = {
prod = 3
staging = 2
dev = 1
}[local.env]
multi_az = local.env == "prod"
}
```
## Estimated Monthly Costs
### Minimal Setup (Dev/POC)
| Resource | Spec | Est. Cost |
|----------|------|-----------|
| NAT Gateway | 1 | $32 |
| RDS | db.t3.micro | $13 |
| ECS Fargate | 0.25 vCPU, 0.5GB x 2 | $15 |
| ALB | 1 | $16 |
| S3 + CloudWatch | Minimal | $5 |
| **Total** | | **~$80/mo** |
### Production (Small)
| Resource | Spec | Est. Cost |
|----------|------|-----------|
| NAT Gateway | 1 | $32 |
| RDS | db.r6g.large, Multi-AZ | $350 |
| ECS Fargate | 1 vCPU, 2GB x 4 | $120 |
| ALB | 1 | $25 |
| EKS | Control plane | $73 |
| EKS Nodes | 2x t3.medium | $60 |
| S3 + CloudWatch | Moderate | $30 |
| **Total** | | **~$690/mo** |
### Production (With Savings Plans)
Same as above with 1-year Compute Savings Plan: **~$480/mo** (30% savings)

170
docs/SECURITY.md Normal file
View File

@@ -0,0 +1,170 @@
# Security Architecture
This document outlines the security controls implemented in this Terraform foundation. These controls align with common compliance frameworks (HIPAA, SOC 2, ISO 27001, HITRUST) without being prescriptive to any specific framework.
## Encryption
### At Rest
| Resource | Encryption | Key Management |
|----------|------------|----------------|
| S3 Buckets | SSE-KMS | Customer-managed KMS keys |
| RDS/Aurora | AES-256 | Customer-managed KMS keys |
| EBS Volumes | AES-256 | Customer-managed KMS keys |
| DynamoDB | AES-256 | Customer-managed KMS keys |
| EKS Secrets | Envelope encryption | Customer-managed KMS keys |
| Secrets Manager | AES-256 | AWS-managed or customer KMS |
### In Transit
| Resource | Protocol | Enforcement |
|----------|----------|-------------|
| S3 | TLS 1.2+ | Bucket policy denies non-HTTPS |
| RDS | TLS 1.2+ | `ca_cert_identifier` configured |
| ALB | TLS 1.2+ | HTTPS listeners with modern policy |
| EKS API | TLS 1.2+ | AWS-managed certificates |
## Access Control
### Network Isolation
```
┌─────────────────────────────────────────────────────────────┐
│ Shared VPC │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Public Subnet │ │ Public Subnet │ ← ALB only │
│ │ (AZ-a) │ │ (AZ-b) │ │
│ └────────┬────────┘ └────────┬────────┘ │
│ │ │ │
│ ┌────────▼────────┐ ┌────────▼────────┐ │
│ │ Private Subnet │ │ Private Subnet │ ← Workloads │
│ │ (AZ-a) │ │ (AZ-b) │ (no public IP) │
│ └─────────────────┘ └─────────────────┘ │
│ │
│ Default SG: DENY ALL (no rules) │
└─────────────────────────────────────────────────────────────┘
```
### Tenant Isolation
1. **Security Groups**: Each tenant has isolated SGs; cross-tenant traffic is denied by default
2. **ABAC (Attribute-Based Access Control)**: IAM policies require `Tenant` tag match
3. **Resource Tagging**: All resources tagged with `Tenant`, `App`, `Environment`
### Identity & Authentication
| Component | Authentication Method |
|-----------|----------------------|
| AWS Console | IAM + MFA (configure separately) |
| EKS Cluster | OIDC + IAM Roles for Service Accounts |
| RDS | Password + IAM Database Authentication |
| Secrets | Secrets Manager with rotation support |
## Audit & Logging
### Log Sources
| Source | Destination | Retention |
|--------|-------------|-----------|
| VPC Flow Logs | CloudWatch Logs | 90 days |
| ALB Access Logs | S3 (logs bucket) | 7 years |
| RDS Audit Logs | CloudWatch Logs | 30 days |
| EKS Control Plane | CloudWatch Logs | 30 days |
| CloudTrail | S3 (configure separately) | 7 years recommended |
### Log Protection
- S3 logs bucket: Versioning enabled, lifecycle to Glacier at 90 days
- CloudWatch Logs: Configurable KMS encryption
- Immutable: S3 Object Lock available (enable for compliance)
## Compute Security
### EKS Nodes
- **IMDSv2 Enforced**: Prevents SSRF-based credential theft
- **Hop Limit = 1**: Containers cannot access node metadata
- **Encrypted EBS**: All node volumes encrypted
- **Private Subnets**: No public IPs on worker nodes
### ECS/Fargate
- **No EC2 Management**: Fargate abstracts host security
- **Task IAM Roles**: Least-privilege per service
- **awsvpc Network Mode**: Each task gets own ENI
### Lambda
- **VPC Optional**: Deploy in VPC for database access
- **X-Ray Tracing**: Request tracking enabled
- **Reserved Concurrency**: Prevent noisy-neighbor DoS
## Data Protection
### Secrets Management
```hcl
# Secrets Manager with automatic rotation
resource "aws_secretsmanager_secret" "db" {
recovery_window_in_days = 30 # Prod: prevent accidental deletion
}
```
### Database Security
- **No Public Access**: `publicly_accessible = false`
- **Security Group**: Only allows traffic from tenant base SG
- **TLS Required**: Certificate validation enforced
- **IAM Auth**: Token-based authentication available
## Vulnerability Management
### Recommendations
1. **ECR Image Scanning**: Enabled by default (`scan_on_push = true`)
2. **Dependency Scanning**: Use Dependabot or Snyk in CI/CD
3. **tfsec**: Security scanning in GitHub Actions workflow
4. **AWS Inspector**: Enable for EC2/EKS vulnerability assessment
## Incident Response
### Recommendations
1. **GuardDuty**: Enable for threat detection
2. **Security Hub**: Aggregate findings across services
3. **CloudWatch Alarms**: CPU, connections, storage alerts configured
4. **SNS Topics**: Wire alarms to PagerDuty/Slack
## Compliance Mapping
| Control | HIPAA | SOC 2 | ISO 27001 | HITRUST |
|---------|-------|-------|-----------|---------|
| Encryption at rest | ✓ | ✓ | ✓ | ✓ |
| Encryption in transit | ✓ | ✓ | ✓ | ✓ |
| Access logging | ✓ | ✓ | ✓ | ✓ |
| Network isolation | ✓ | ✓ | ✓ | ✓ |
| Least privilege IAM | ✓ | ✓ | ✓ | ✓ |
| Key management | ✓ | ✓ | ✓ | ✓ |
## What's NOT Included (Configure Separately)
- CloudTrail (account-level, usually in audit account)
- AWS Config Rules
- GuardDuty
- Security Hub
- AWS WAF (per-application decision)
- MFA enforcement (IAM policy)
- Password policies (IAM)
- Backup policies (AWS Backup)
## Cost Considerations
Security features with cost impact:
| Feature | Cost Impact | Recommendation |
|---------|-------------|----------------|
| KMS keys | ~$1/mo per key | Use for production |
| VPC Flow Logs | ~$0.50/GB | Enable for compliance |
| Enhanced Monitoring | ~$0.10/instance/mo | Production only |
| Performance Insights | Free (7 days) | Always enable |
| S3 Glacier | ~$0.004/GB/mo | Use for log archival |