Educational Q&A with LLMs, RAG, and DPO

Large Language Models

Retrieval-Augmented Generation (RAG)

Direct Preference Optimization (DPO)

MCQA

Quantization

Educational AI

Transformer Models

École Polytechnique Fédérale de Lausanne (EPFL) • February – June 2025

LLM Developer

Project Overview

Completed as a semester-long project in the Modern Natural Language Processing (MNLP) course at EPFL. The project aimed to build a robust LLM-powered assistant for answering multiple-choice STEM questions. Our team of four fine-tuned the Qwen3-0.6B model across four approaches: supervised MCQA training, retrieval-augmented generation (RAG) with FAISS and DPR, Direct Preference Optimization (DPO) using GPT-annotated pairs, and quantization for efficiency. Evaluations were performed on the MMLU benchmark and a curated STEM QA set.

Challenges

•Designing a domain-specific fine-tuning pipeline for a 0.6B parameter LLM
•Combining dense retrieval and generation in a performant RAG architecture
•Aligning model responses with human preferences via DPO
•Quantizing models without significant loss in MCQA accuracy
•Balancing factuality, explainability, and model efficiency in educational use

Key Achievements

•Achieved 76% reward accuracy with the DPO model, a 20% improvement over the baseline
•RAG variant for computer science questions reached 76% MCQA accuracy on a curated STEM test set
•Reduced peak VRAM usage by 75–80% with 4-bit quantized models while preserving key capabilities
•Built a FAISS-indexed corpus of 100k STEM documents and trained DPR retrievers for precise context retrieval
•Compiled comprehensive technical reports analyzing trade-offs across architectures, domains, and evaluation methods

Technologies Used

Qwen3-0.6B

PyTorch

Hugging Face Transformers

FAISS

Dense Passage Retrieval (DPR)

Direct Preference Optimization (DPO)

Quantization