Towards Safe and Reliable Foundation Models

Jeonghyeon Kim is a Ph.D. student in Data Science at Seoul National University of Science and Technology (SeoulTech), advised by Prof. Sangheum Hwang. His research aims to build Safe and Reliable Foundation Models while maintaining state-of-the-art performance. He primarily focuses on Trustworthy Multi-modal Learning, developing robust Out-of-Distribution (OoD) detection mechanisms and Concept Erasure techniques for generative models. He particularly enjoys interpreting these challenges through the theoretical lens of Energy-Based Models (EBMs). Recently, he has proposed localized concept erasure methods via high-level representation misdirection in diffusion models and is expanding his research to Mechanistic Interpretability to achieve transparent and safe knowledge unlearning in Large Language Models (LLMs).