• Home
  • About
  • Experience
  • Projects
  • Skills
  • Blog
  • Contact
© 2025 iAMVamsi.

Made with by iAMVamsi.

← Back to Projects
AI/Life Sciences

Clinical Text Summarization Using Large Language Models

Replicated a research study on hospital course summarization using Clinical-T5, LLaMA2-13B, and GPT-4 with the MIMIC-IV dataset, implementing advanced fine-tuning techniques and comprehensive evaluation.

Completed: June 23, 2025AI/Life Sciences
PythonLLMHealthcare NLPFine-tuningClinical AIBERTQLoRA
Source Code Available on RequestView Dataset →
Clinical Text Summarization Using Large Language Models

Project Overview

Successfully replicated a cutting-edge research paper on hospital course summarization using Large Language Models (LLMs) with the MIMIC-IV dataset. Implemented a comprehensive clinical summarization pipeline exploring Clinical-T5, LLaMA2-13B, and GPT-4 for generating accurate Brief Hospital Course (BHC) summaries. The project involved rigorous data preprocessing, including regex-based BHC extraction, clinical text normalization, and dataset segmentation into context bins (short ≤1024, medium 1025-2048, long >2048 tokens). Utilized advanced technical optimizations including 4-bit quantization, Unsloth framework integration, and QLoRA fine-tuning achieving significant performance improvements with LLaMA2-13B reaching 0.683 BERT F1-Score.

Key Features

  • ✓
    Clinical text summarization pipeline
  • ✓
    Multi-model comparison (Clinical-T5, LLaMA2-13B, GPT-4)
  • ✓
    MIMIC-IV dataset processing and preprocessing
  • ✓
    Regex-based Brief Hospital Course extraction
  • ✓
    Clinical text normalization and validation
  • ✓
    Context-based dataset binning (short/medium/long)
  • ✓
    QLoRA fine-tuning implementation
  • ✓
    4-bit quantization for memory efficiency
  • ✓
    Unsloth framework integration
  • ✓
    Comprehensive hyperparameter tuning
  • ✓
    BERT Score evaluation metrics
  • ✓
    Zero-shot and prefix prompting strategies
  • ✓
    HPC infrastructure deployment
  • ✓
    WandB integration for monitoring
  • ✓
    Memory optimization techniques
  • ✓
    Production-ready clinical NLP pipeline

Technical Challenges

  • ⚡
    Handling confidential healthcare data with proper certification
  • ⚡
    Managing large-scale clinical text preprocessing
  • ⚡
    Optimizing memory usage for 13B parameter models
  • ⚡
    Balancing performance across different context lengths
  • ⚡
    Implementing efficient fine-tuning strategies
  • ⚡
    Managing 8+ hours of intensive computation
  • ⚡
    Ensuring clinical relevance and accuracy
  • ⚡
    Deploying on HPC infrastructure with SLURM

Technologies Used

PythonPyTorchHuggingFace TransformersClinical-T5LLaMA2-13BGPT-4QLoRAUnslothBERT ScoreCUDASLURMWandBMIMIC-IVHealthcare NLP

Project Info

CategoryAI/Life Sciences
CompletedJune 23, 2025
FeaturedYes

Screenshots

Clinical Text Summarization Using Large Language Models screenshot 1
Clinical Text Summarization Using Large Language Models screenshot 2
Clinical Text Summarization Using Large Language Models screenshot 3

Related Projects

NLP Pipeline for Medical Data Processing

NLP Pipeline for Medical Data Processing

Built an NLP pipeline to process Medline XML and ChEBI ontology data for clinical research and pharmaceutical applications.

December 1, 2024 • AI/Life Sciences
Comparative LLM Fine-tuning for Knowledge Extraction

Comparative LLM Fine-tuning for Knowledge Extraction

Conducted systematic comparative experiments on Mistral-7B fine-tuning using three distinct approaches on NewsKG21 dataset to optimize knowledge extraction performance.

November 15, 2024 • AI/Life Sciences
Bio-Inspired Optimization for Personalized Diabetes Management

Bio-Inspired Optimization for Personalized Diabetes Management

Developed a bio-inspired optimization system integrating genetic algorithms with physiological modeling for personalized Type 2 diabetes management.

April 20, 2025 • AI/Life Sciences

Interested in This Project?

Have questions about the implementation or want to discuss similar projects? Let's connect!

Get in TouchView More Projects