- Add config.py with dataclass-based configuration from env vars
- Remove hardcoded RunPod endpoint and credentials
- Consolidate duplicate training components into single reusable function
- Add .env.example with all configurable options
- Update README with environment variable documentation
- Add Kubernetes secrets example for production deployments
- Add timeout and error handling improvements
BREAKING: Pipeline parameters now use env vars by default.
Set RUNPOD_API_KEY, RUNPOD_ENDPOINT, S3_BUCKET, and AWS creds.
New tasks supported:
- task=ade: Adverse Drug Event classification (ADE Corpus V2, 30K samples)
- task=triage: Medical Triage classification (urgency levels)
- task=symptom_disease: Symptom-to-Disease prediction (40+ diseases)
All use HuggingFace datasets, Bio_ClinicalBERT, and S3 model storage.