refactor: environment variable configuration for all pipeline settings

- Add config.py with dataclass-based configuration from env vars
- Remove hardcoded RunPod endpoint and credentials
- Consolidate duplicate training components into single reusable function
- Add .env.example with all configurable options
- Update README with environment variable documentation
- Add Kubernetes secrets example for production deployments
- Add timeout and error handling improvements

BREAKING: Pipeline parameters now use env vars by default.
Set RUNPOD_API_KEY, RUNPOD_ENDPOINT, S3_BUCKET, and AWS creds.
This commit is contained in:
2026-02-03 20:47:27 +00:00
parent 419918460d
commit 5f554ea769
4 changed files with 490 additions and 226 deletions

37
.env.example Normal file
View File

@@ -0,0 +1,37 @@
# =============================================================================
# Healthcare ML Pipeline Configuration
# =============================================================================
# Copy this file to .env and fill in your values.
# DO NOT commit .env to version control!
# -----------------------------------------------------------------------------
# RunPod Configuration (Required)
# -----------------------------------------------------------------------------
RUNPOD_API_KEY=your_runpod_api_key_here
RUNPOD_ENDPOINT=your_endpoint_id_here
RUNPOD_API_BASE=https://api.runpod.ai/v2
# -----------------------------------------------------------------------------
# AWS Configuration (Required for model storage)
# -----------------------------------------------------------------------------
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
AWS_SESSION_TOKEN= # Optional - for assumed role sessions
AWS_REGION=us-east-1
S3_BUCKET=your-model-bucket
# -----------------------------------------------------------------------------
# Model Training Defaults (Optional - sensible defaults provided)
# -----------------------------------------------------------------------------
BASE_MODEL=emilyalsentzer/Bio_ClinicalBERT
MAX_SAMPLES=10000
EPOCHS=3
BATCH_SIZE=16
EVAL_SPLIT=0.1
LEARNING_RATE=2e-5
# -----------------------------------------------------------------------------
# Pipeline Runtime Settings (Optional)
# -----------------------------------------------------------------------------
POLL_INTERVAL_SECONDS=10 # How often to check training status
TRAINING_TIMEOUT_SECONDS=3600 # Max training time (1 hour default)