Homepage - Jeonghyeon Kim

Jeonghyeon Kim

Ph.D. student
Seoul National University of Science and Technology

mawjdgus(at)ds.seoultech.ac.kr

About Me

I am a Ph.D. student at Seoul National University of Science and Technology, advised by Sangheum Hwang.

The thread running through my work is understanding what a model does not know, and when it should not be trusted. On one side, this leads me to out-of-distribution detection and uncertainty estimation, often with large-scale vision-language models such as CLIP. On the other side lies the flip problem—not detecting knowledge, but removing it—which I explore through machine unlearning in LLMs and concept erasure in text-to-image diffusion models, asking how a deployed model can reliably forget or refuse what it should no longer retain.

Education

Seoul National University of Science and Technology

Department of Data Science
Ph.D. Student

Mar. 2023 - present
Seoul National University of Science and Technology

B.S. in Industrial Engineering

Sep. 2016 - Feb. 2022

Honors & Awards

Best Paper Award (KCC2026)

2026
Ph.D. Student Research Grant (Mechanistic Interpretability of Knowledge Unlearning in Foundation Models, 50M KRW / 2years, NRF)

Aug. 2025 - Aug. 2027
Best Paper Award (ICEIC2024)

2024
Best Paper Award (KCC2023)

2023

News

2026

Our paper on localized concept erasure for text-to-image diffusion models (HiRM) was accepted at ICLR 2026! 🎉

Jan 25

2025

Our paper on multi-modal fine-tuning for OOD detection was accepted at CVPR2025!🎉

Mar 23

Selected Publications (view all )

Reference-free LLM Unlearning Evaluation through Energy Trajectory Analysis

Jeonghyeon Kim, Sangheum Hwang

Korea Computer Congress (KCC) 2026 Oral Best Paper Award

We propose a method for evaluating machine unlearning in large language models (LLMs) without a reference model—which conventional evaluation requires by retraining on the retain dataset, making it impractical for real deployment—using only the forget dataset and the original model to approximate the reference model's average energy at the dataset level. Specifically, we progressively inject direction-fixed noise into answer embeddings of a question-answering dataset to observe an energy trajectory defined by the negative log-likelihood, and find that the energy near the inflection point on this landscape closely approximates the reference model's average energy. As a result, we confirm that under the TOFU benchmark setting, the unlearning evaluation rankings from our energy-trajectory-based analysis match those obtained using the reference model directly.

Reference-free LLM Unlearning Evaluation through Energy Trajectory Analysis

Jeonghyeon Kim, Sangheum Hwang

Korea Computer Congress (KCC) 2026 Oral Best Paper Award

Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

Uichan Lee*, Jeonghyeon Kim*, Sangheum Hwang (* equal contribution)

International Conference on Learning Representations (ICLR) 2026

We propose High-Level Representation Misdirection (HiRM), a concept erasure method for text-to-image diffusion models that, unlike prior work targeting the denoiser, fine-tunes only the early text-encoder layers where causal studies show visual attributes are localized. HiRM misdirects the high-level semantic representations of target concepts toward designated vectors (e.g., random or super-category directions), enabling precise concept removal with minimal impact on unrelated concepts. It achieves strong results on the UnlearnCanvas and NSFW benchmarks across diverse targets, preserves generative utility at low cost, transfers to architectures like Flux without additional training, and works synergistically with denoiser-based erasure methods.

[Paper] [Code]

Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

Uichan Lee*, Jeonghyeon Kim*, Sangheum Hwang (* equal contribution)

International Conference on Learning Representations (ICLR) 2026

[Paper] [Code]

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Jeonghyeon Kim, Sangheum Hwang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

We show that multi-modal fine-tuning (MMFT) of vision-language models like CLIP can achieve strong out-of-distribution detection (OoDD), and identify the modality gap in in-distribution embeddings as a key limitation of naïve fine-tuning. To address this, we propose a training objective that enhances cross-modal alignment by regularizing the distances between image and text embeddings, which we theoretically show corresponds to maximum likelihood estimation of an energy-based model on a hypersphere. Combined with post-hoc methods such as NegLabel, our approach achieves state-of-the-art OoDD performance and leading ID accuracy on ImageNet-1k benchmarks.

[Paper] [Code]

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Jeonghyeon Kim, Sangheum Hwang

IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2025

[Paper] [Code]

Comparison of Out-of-Distribution Detection Performance of CLIP-based Fine-Tuning Methods

Jeonghyeon Kim, Jihyo KIm, Sangheum Hwang

International Conference on Electronics, Information, and Communication (ICEIC) 2024 Oral Best Paper Award

We present a comprehensive comparison of CLIP-based fine-tuning methods, evaluating not only in-distribution classification but also out-of-distribution (OOD) detection, which is often overlooked despite its importance for model reliability. Using the OpenOOD v1.5 benchmark, we analyze how different fine-tuning strategies affect both aspects of performance. Our results show that fine-tuning the entire model achieves the best performance in both classification and OOD detection under few-shot settings.

[Paper]

Comparison of Out-of-Distribution Detection Performance of CLIP-based Fine-Tuning Methods

Jeonghyeon Kim, Jihyo KIm, Sangheum Hwang

International Conference on Electronics, Information, and Communication (ICEIC) 2024 Oral Best Paper Award

[Paper]

Enhancing Out-of-Distribution Detection Performance of CLIP Based on Fine-tuning with Random Texts

Jeonghyeon Kim, Jihyo Kim, Sangheum Hwang

Korea Computer Congress (KCC) 2023 Oral Best Paper Award

We propose a method to improve the out-of-distribution (OOD) detection performance of CLIP, a vision-language model that learns from both images and text. Specifically, we apply contrastive fine-tuning using arbitrary text to a CLIP model pre-trained on large-scale internet image-text pairs. This work demonstrates the feasibility of leveraging multimodal approaches for OOD detection.

[Paper]

Enhancing Out-of-Distribution Detection Performance of CLIP Based on Fine-tuning with Random Texts

Jeonghyeon Kim, Jihyo Kim, Sangheum Hwang

Korea Computer Congress (KCC) 2023 Oral Best Paper Award

[Paper]

Warning

Action required

Education

Honors & Awards

News

Selected Publications (view all )

Reference-free LLM Unlearning Evaluation through Energy Trajectory Analysis

Reference-free LLM Unlearning Evaluation through Energy Trajectory Analysis

Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

Localized Concept Erasure in Text-to-Image Diffusion Models via High-Level Representation Misdirection

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Enhanced OoD Detection through Cross-Modal Alignment of Multi-Modal Representations

Comparison of Out-of-Distribution Detection Performance of CLIP-based Fine-Tuning Methods

Comparison of Out-of-Distribution Detection Performance of CLIP-based Fine-Tuning Methods

Enhancing Out-of-Distribution Detection Performance of CLIP Based on Fine-tuning with Random Texts

Enhancing Out-of-Distribution Detection Performance of CLIP Based on Fine-tuning with Random Texts

All publications