Back to Projects

Educational Q&A with LLMs, RAG, and DPO
Large Language Models
Retrieval-Augmented Generation (RAG)
Direct Preference Optimization (DPO)
MCQA
Quantization
Educational AI
Transformer Models
École Polytechnique Fédérale de Lausanne (EPFL) • February – June 2025
LLM Developer
Project Overview
Completed as a semester-long project in the Modern Natural Language Processing (MNLP) course at EPFL. The project aimed to build a robust LLM-powered assistant for answering multiple-choice STEM questions. Our team of four fine-tuned the Qwen3-0.6B model across four approaches: supervised MCQA training, retrieval-augmented generation (RAG) with FAISS and DPR, Direct Preference Optimization (DPO) using GPT-annotated pairs, and quantization for efficiency. Evaluations were performed on the MMLU benchmark and a curated STEM QA set.
Challenges
- •Designing a domain-specific fine-tuning pipeline for a 0.6B parameter LLM
- •Combining dense retrieval and generation in a performant RAG architecture
- •Aligning model responses with human preferences via DPO
- •Quantizing models without significant loss in MCQA accuracy
- •Balancing factuality, explainability, and model efficiency in educational use
Key Achievements
- •Achieved 76% reward accuracy with the DPO model, a 20% improvement over the baseline
- •RAG variant for computer science questions reached 76% MCQA accuracy on a curated STEM test set
- •Reduced peak VRAM usage by 75–80% with 4-bit quantized models while preserving key capabilities
- •Built a FAISS-indexed corpus of 100k STEM documents and trained DPR retrievers for precise context retrieval
- •Compiled comprehensive technical reports analyzing trade-offs across architectures, domains, and evaluation methods
Technologies Used
Qwen3-0.6B
PyTorch
Hugging Face Transformers
FAISS
Dense Passage Retrieval (DPR)
Direct Preference Optimization (DPO)
Quantization