Back to Projects
Educational Q&A with LLMs, RAG, and DPO

Educational Q&A with LLMs, RAG, and DPO

Large Language Models
Retrieval-Augmented Generation (RAG)
Direct Preference Optimization (DPO)
MCQA
Quantization
Educational AI
Transformer Models

École Polytechnique Fédérale de Lausanne (EPFL)February – June 2025

LLM Developer

Project Overview

Completed as a semester-long project in the Modern Natural Language Processing (MNLP) course at EPFL. The project aimed to build a robust LLM-powered assistant for answering multiple-choice STEM questions. Our team of four fine-tuned the Qwen3-0.6B model across four approaches: supervised MCQA training, retrieval-augmented generation (RAG) with FAISS and DPR, Direct Preference Optimization (DPO) using GPT-annotated pairs, and quantization for efficiency. Evaluations were performed on the MMLU benchmark and a curated STEM QA set.

Challenges

  • Designing a domain-specific fine-tuning pipeline for a 0.6B parameter LLM
  • Combining dense retrieval and generation in a performant RAG architecture
  • Aligning model responses with human preferences via DPO
  • Quantizing models without significant loss in MCQA accuracy
  • Balancing factuality, explainability, and model efficiency in educational use

Key Achievements

  • Achieved 76% reward accuracy with the DPO model, a 20% improvement over the baseline
  • RAG variant for computer science questions reached 76% MCQA accuracy on a curated STEM test set
  • Reduced peak VRAM usage by 75–80% with 4-bit quantized models while preserving key capabilities
  • Built a FAISS-indexed corpus of 100k STEM documents and trained DPR retrievers for precise context retrieval
  • Compiled comprehensive technical reports analyzing trade-offs across architectures, domains, and evaluation methods

Technologies Used

Qwen3-0.6B
PyTorch
Hugging Face Transformers
FAISS
Dense Passage Retrieval (DPR)
Direct Preference Optimization (DPO)
Quantization