mirror of
https://github.com/ghndrx/kubeflow-pipelines.git
synced 2026-02-10 06:45:13 +00:00
- Add config.py with dataclass-based configuration from env vars - Remove hardcoded RunPod endpoint and credentials - Consolidate duplicate training components into single reusable function - Add .env.example with all configurable options - Update README with environment variable documentation - Add Kubernetes secrets example for production deployments - Add timeout and error handling improvements BREAKING: Pipeline parameters now use env vars by default. Set RUNPOD_API_KEY, RUNPOD_ENDPOINT, S3_BUCKET, and AWS creds.
4.2 KiB
4.2 KiB
Healthcare ML Training Pipeline
Serverless GPU training infrastructure for healthcare NLP models. Training runs on RunPod serverless GPUs, with trained models stored in S3.
Overview
This project provides production-ready ML pipelines for training healthcare classification models:
- Drug-Drug Interaction (DDI) - Severity classification from DrugBank (176K samples)
- Adverse Drug Events (ADE) - Binary detection from ADE Corpus V2 (30K samples)
- Medical Triage - Urgency level classification
- Symptom-to-Disease - Diagnosis prediction (41 disease classes)
All models use Bio_ClinicalBERT as the base and are fine-tuned on domain-specific datasets.
Training Results
| Task | Dataset | Samples | Accuracy | F1 Score |
|---|---|---|---|---|
| DDI Classification | DrugBank | 176K | 100% | 100% |
| ADE Detection | ADE Corpus V2 | 9K | 93.5% | 95.3% |
| Symptom-Disease | Disease Symptoms | 4.4K | 100% | 100% |
Quick Start
Run Training
curl -X POST "https://api.runpod.ai/v2/YOUR_ENDPOINT/run" \
-H "Authorization: Bearer $RUNPOD_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": {
"task": "ddi",
"model_name": "emilyalsentzer/Bio_ClinicalBERT",
"max_samples": 10000,
"epochs": 3,
"batch_size": 16,
"s3_bucket": "your-bucket",
"aws_access_key_id": "...",
"aws_secret_access_key": "...",
"aws_session_token": "..."
}
}'
Available tasks: ddi, ade, triage, symptom_disease
Download Trained Model
aws s3 cp s3://your-bucket/model.tar.gz .
tar -xzf model.tar.gz
Project Structure
├── components/
│ └── runpod_trainer/
│ ├── Dockerfile
│ ├── handler.py # Multi-task training logic
│ ├── requirements.txt
│ └── data/ # DrugBank DDI dataset
├── pipelines/
│ ├── healthcare_training.py # Kubeflow pipeline definitions
│ ├── ddi_training_runpod.py
│ └── ddi_data_prep.py
├── .github/workflows/
│ └── build-trainer.yaml # CI/CD
└── manifests/
└── argocd-app.yaml
Configuration
All configuration is via environment variables. Copy .env.example to .env and fill in your values:
cp .env.example .env
# Edit .env with your credentials
Environment Variables
| Variable | Required | Default | Description |
|---|---|---|---|
RUNPOD_API_KEY |
Yes | - | RunPod API key |
RUNPOD_ENDPOINT |
Yes | - | RunPod serverless endpoint ID |
AWS_ACCESS_KEY_ID |
Yes | - | AWS credentials for S3 |
AWS_SECRET_ACCESS_KEY |
Yes | - | AWS credentials for S3 |
AWS_SESSION_TOKEN |
No | - | For assumed role sessions |
AWS_REGION |
No | us-east-1 | AWS region |
S3_BUCKET |
Yes | - | Bucket for model artifacts |
BASE_MODEL |
No | Bio_ClinicalBERT | HuggingFace model ID |
MAX_SAMPLES |
No | 10000 | Training samples |
EPOCHS |
No | 3 | Training epochs |
BATCH_SIZE |
No | 16 | Batch size |
Kubernetes Secrets (Recommended)
For production, use Kubernetes secrets instead of environment variables:
apiVersion: v1
kind: Secret
metadata:
name: ml-pipeline-secrets
type: Opaque
stringData:
RUNPOD_API_KEY: "your-key"
AWS_ACCESS_KEY_ID: "your-key"
AWS_SECRET_ACCESS_KEY: "your-secret"
Supported Models
| Model | Type | Use Case |
|---|---|---|
emilyalsentzer/Bio_ClinicalBERT |
BERT | Classification tasks |
meta-llama/Llama-3.1-8B-Instruct |
LLM | Text generation (LoRA) |
google/gemma-3-4b-it |
LLM | Lightweight inference |
Parameters
| Parameter | Default | Description |
|---|---|---|
task |
ddi | Training task |
model_name |
Bio_ClinicalBERT | HuggingFace model ID |
max_samples |
10000 | Training samples |
epochs |
3 | Training epochs |
batch_size |
16 | Batch size |
eval_split |
0.1 | Validation split |
s3_bucket |
- | S3 bucket for output |
Development
# Build container
cd components/runpod_trainer
docker build -t healthcare-trainer .
# Trigger CI build
gh workflow run build-trainer.yaml
License
MIT