feat: Add 176K real DrugBank DDI samples with drug names

- Downloaded 191K DDI pairs from TDC DrugBank
- Fetched 1,634 drug names from PubChem API (96% hit rate)
- Created complete training dataset with:
  - Real drug names (not just IDs)
  - 86 interaction type descriptions
  - Severity labels (minor/moderate/major/contraindicated)
- Bundled 34MB data file in Docker image
- Handler loads real data instead of curated samples
This commit is contained in:
2026-02-03 04:34:54 +00:00
parent 39922e8d2e
commit 67a1095100
4 changed files with 176245 additions and 153 deletions

View File

@@ -8,11 +8,13 @@ COPY requirements.txt /app/requirements.txt
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Copy handler
# Copy handler and data
COPY handler.py /app/handler.py
COPY data/ /app/data/
# Set environment variables
ENV PYTHONUNBUFFERED=1
ENV HF_HOME=/tmp/huggingface
ENV DDI_DATA_PATH=/app/data/drugbank_ddi_complete.jsonl
CMD ["python", "-u", "handler.py"]