- Downloaded 191K DDI pairs from TDC DrugBank
- Fetched 1,634 drug names from PubChem API (96% hit rate)
- Created complete training dataset with:
- Real drug names (not just IDs)
- 86 interaction type descriptions
- Severity labels (minor/moderate/major/contraindicated)
- Bundled 34MB data file in Docker image
- Handler loads real data instead of curated samples
- Added PEFT, bitsandbytes, TRL for LoRA training
- 4-bit QLoRA quantization for 48GB GPU fit
- Instruction-tuning format for Gemma chat template
- Auto-detect model type (BERT vs LLM)
- Updated GPU tier to ADA_24/AMPERE_48
- Switch to self-hosted runner on compute-01 for faster builds
- Replace PyTDC with curated DDI dataset (no heavy deps)
- 60+ real drug interaction patterns based on clinical guidelines
- Generates up to 10K training samples with text variations
- Maintains 5-level severity classification
- Added PyTDC dependency for DrugBank access
- Implemented DDI type -> severity label mapping (0-4)
- Added train/eval split with stratification
- Added accuracy and F1 metrics for evaluation
- Default: 50K samples from DrugBank DDI
- Supports both real data and custom inline data