kubeflow-pipelines

mirror of https://github.com/ghndrx/kubeflow-pipelines.git synced 2026-02-10 06:45:13 +00:00

Author	SHA1	Message	Date
Greg Hendrickson	2e479fc61b	chore: clean up repo structure - Remove compiled YAML files (can be regenerated) - Remove example pipelines - Remove unused med_rx_training.py - Update README with comprehensive docs - Clean up .gitignore	2026-02-03 16:11:40 +00:00
Greg Hendrickson	9595ef09fd	fix: add aws_session_token support for SSO credentials	2026-02-03 15:42:19 +00:00
Greg Hendrickson	59c808cb3a	feat: add S3 model upload support - Add upload_to_s3 function to handler - Save trained BERT models to S3 when credentials provided - Save LoRA adapters to S3 when credentials provided - Input params: s3_bucket, s3_prefix, aws_access_key_id, aws_secret_access_key, aws_region	2026-02-03 15:13:21 +00:00
Greg Hendrickson	c6fd06369f	fix: Upgrade torch via pip instead of new base image - Use known-working base image (2.4.0) - Upgrade torch to 2.6+ via pip - Previous base image was too slow to pull	2026-02-03 06:45:27 +00:00
Greg Hendrickson	9b66b5fd14	fix: Upgrade to torch 2.6+ for CVE-2025-32434 compliance - Use runpod/pytorch:1.0.3-cu1290-torch260-ubuntu2204 base image - Torch 2.6.0 required by transformers for secure model loading - CUDA 12.9 compatible	2026-02-03 06:02:38 +00:00
Greg Hendrickson	7a7b737cc0	fix: Pin transformers version for torch compatibility - Force reinstall torch 2.4.0 in Dockerfile - Pin transformers==4.47.1 (known working with bitsandbytes) - Fix set_submodule AttributeError	2026-02-03 05:20:31 +00:00
Greg Hendrickson	45b96e2094	feat: Switch to Llama 3.1 8B (Bedrock-compatible) - Default model now meta-llama/Llama-3.1-8B-Instruct - Added multi-model chat format support: - Llama 3 format - Mistral/Mixtral format - Qwen format - Gemma format - Trained model can be imported to AWS Bedrock	2026-02-03 04:38:54 +00:00
Greg Hendrickson	67a1095100	feat: Add 176K real DrugBank DDI samples with drug names - Downloaded 191K DDI pairs from TDC DrugBank - Fetched 1,634 drug names from PubChem API (96% hit rate) - Created complete training dataset with: - Real drug names (not just IDs) - 86 interaction type descriptions - Severity labels (minor/moderate/major/contraindicated) - Bundled 34MB data file in Docker image - Handler loads real data instead of curated samples	2026-02-03 04:34:54 +00:00
Greg Hendrickson	39922e8d2e	feat: Add Gemma 3 12B with QLoRA fine-tuning - Added PEFT, bitsandbytes, TRL for LoRA training - 4-bit QLoRA quantization for 48GB GPU fit - Instruction-tuning format for Gemma chat template - Auto-detect model type (BERT vs LLM) - Updated GPU tier to ADA_24/AMPERE_48	2026-02-03 03:58:25 +00:00
Greg Hendrickson	4ff491f847	feat: Use self-hosted runner + curated DDI dataset - Switch to self-hosted runner on compute-01 for faster builds - Replace PyTDC with curated DDI dataset (no heavy deps) - 60+ real drug interaction patterns based on clinical guidelines - Generates up to 10K training samples with text variations - Maintains 5-level severity classification	2026-02-03 03:27:10 +00:00
Greg Hendrickson	afc8fc6690	feat: Add real DrugBank DDI dataset support via TDC - Added PyTDC dependency for DrugBank access - Implemented DDI type -> severity label mapping (0-4) - Added train/eval split with stratification - Added accuracy and F1 metrics for evaluation - Default: 50K samples from DrugBank DDI - Supports both real data and custom inline data	2026-02-03 02:48:31 +00:00
Greg Hendrickson	0f4858d22f	fix: disable checkpoint saving to avoid tensor contiguity error	2026-02-03 02:38:15 +00:00
Greg Hendrickson	2680ad5502	fix: remove MinIO dependency, use inline training data	2026-02-03 02:27:49 +00:00
Greg Hendrickson	b086239c52	Fix Python/transformers version compatibility	2026-02-03 01:11:20 +00:00
Greg Hendrickson	297a75da4d	Remove internal domains from README	2026-02-03 00:45:27 +00:00
Greg Hendrickson	c459bb3139	Add GHA workflow to build DDI trainer image	2026-02-03 00:28:09 +00:00
Greg Hendrickson	222be0fb68	Use Tailscale endpoints, add RunPod Docker build files	2026-02-03 00:23:16 +00:00
Greg Hendrickson	07bb8aa6bb	Fix MinIO endpoint to use internal cluster service	2026-02-03 00:15:26 +00:00
Greg Hendrickson	9ca3d6c195	Add DDI training pipeline with RunPod serverless GPU support	2026-02-02 23:56:05 +00:00
Greg Hendrickson	11da494a4f	Use self-hosted runner on compute-01 for full GitOps	2026-02-02 23:46:14 +00:00
Greg Hendrickson	db0bd65281	Add GHA workflow for pipeline compilation	2026-02-02 23:41:49 +00:00
Greg Hendrickson	09ee583d6e	Add .gitignore to prevent credential leaks	2026-02-02 23:39:06 +00:00
Greg Hendrickson	591a312399	Initial Kubeflow GitOps setup with example pipelines	2026-02-02 23:37:53 +00:00

23 Commits