Xuan Luo (罗炫)

xuan_luo AT ucsb DOT edu / GitHub / Google Scholar / Hugging Face / CV

I’m a 3rd year PhD student advised by Prof. Xifeng Yan in University of California Santa Barbara. My current research is mainly about efficient large language models and applications of generative models.

Selected publications

2025
Xuan Luo, Weizhi Wang, Xifeng Yan
Preprint
ABSTRACT
Decoder-only transformers have become the standard architecture for large language models (LLMs) due to their strong performance. Recent studies suggest that, in pre-trained LLMs, early, middle, and late layers may serve distinct roles: Early layers focus on understanding the input context, middle layers handle task-specific processing, and late layers convert abstract representations into output tokens. We hypothesize that once representations have been processed by the early and middle layers, the resulting hidden states may encapsulate sufficient information to support the generation of multiple tokens using only the late layers, eliminating the need to repeatedly traverse the early and middle layers. We refer to this inference paradigm as Direct Multi-Token Decoding (DMTD). Unlike speculative decoding, our method introduces no additional parameters, auxiliary routines, or post-generation verification. Despite being trained on a limited dataset, a fine-tuned DMTD Qwen3-4B model has already demonstrated promising results, achieving up to a 2× speedup with only minor performance loss. Moreover, as shown in our scaling analysis, its performance is expected to further improve with larger training datasets.
PDF CODE WEBSITE
Xuan Luo, Weizhi Wang, Xifeng Yan
COLM 2025
Oral Presentation
ABSTRACT
Various layer-skipping methods have been proposed to accelerate token generation in large language models (LLMs). However, they have overlooked a fundamental question: How do computational demands vary across the generation of different tokens? In this work, we introduce FlexiDepth, a method that dynamically adjusts the number of Transformer layers used in text generation. By incorporating a plug-in router and adapter, FlexiDepth enables adaptive layer-skipping in LLMs without modifying their original parameters. Introducing FlexiDepth to Llama-3-8B model achieves layer skipping of 8 layers out of 32, and meanwhile maintains the full 100% benchmark performance. Experimental results with FlexiDepth demonstrate that computational demands in LLMs significantly vary based on token type. Specifically, generating repetitive tokens or fixed phrases requires fewer layers, whereas producing tokens involving computation or high uncertainty requires more layers. Interestingly, this adaptive allocation pattern aligns with human intuition. To advance research in this area, we open sourced FlexiDepth and a dataset documenting FlexiDepth's layer allocation patterns for future exploration.
PDF CODE DATASET WEBSITE
DiffSkip: Differential Layer Skipping in Large Language Models
Xuan Luo, Weizhi Wang, Xifeng Yan
ACL Findings 2025
ABSTRACT
Existing Large Language Models (LLMs) enforce uniform computation across all tokens. We analyze the correlation between the input-output difference of self-attention block and Feed-Forward Network (FFN) within the same transformer layer, and find that these two differential vectors are highly correlated. Thus, we propose to dynamically skip the FFN blocks based on the self-attention difference and introduce Diffential Layer Skipping (DiffSkip) to show that LLMs are inherently dynamic-depth models, capable of adjusting computational depth when generating different tokens. DiffSkip employs a lightweight router module to dynamically skip a set of FFN blocks in LLMs and only requires efficient fine-tuning while keeping the whole LLM frozen. Experimental results demonstrate that DiffSkip effectively enables dynamic FFN skipping in decoder-only language models, even in continuous token generation tasks where many layer-skipping methods struggle.
PDF CODE
2024
Bot or human? detecting chatgpt imposters with a single question
Hong Wang, Xuan Luo, Weizhi Wang, Xifeng Yan
COLM 2024
ABSTRACT
Large language models (LLMs) like GPT-4 have recently demonstrated impressive capabilities in natural language understanding and generation. However, there is a concern that they can be misused for malicious purposes, such as fraud or denial-of-service attacks. Therefore, it is crucial to develop methods for detecting whether the party involved in a conversation is a bot or a human. In this paper, we propose a framework named FLAIR, Finding Large Language Model Authenticity via a Single Inquiry and Response, to detect conversational bots in an online manner. Specifically, we target a single question scenario that can effectively differentiate human users from bots. The questions are divided into two categories: those that are easy for humans but difficult for bots (e.g., counting, substitution, searching, and ASCII art reasoning), and those that are easy for bots but difficult for humans (e.g., memorization and computation). Our approach shows different strengths of these questions in their effectiveness, providing a new way for online service providers to protect themselves against nefarious activities. Our code and question set are available at https://github.com/hongwang600/FLAIR.
PDF CODE DATASET
2022
Progressive attentional manifold alignment for arbitrary style transfer
Xuan Luo, Zhen Han, Linkang Yang
ACCV 2022
ABSTRACT
Arbitrary style transfer algorithms can generate stylization results with arbitrary content-style image pairs but will distort content structures and bring degraded style patterns. The content distortion problem has been well issued using high-frequency signals, salient maps, and low-level features. However, the style degradation problem is still unsolved. Since there is a considerable semantic discrepancy between content and style features, we assume they follow two different manifold distributions. The style degradation happens because existing methods cannot fully leverage the style statistics to render the content feature that lies on a different manifold. Therefore we designed the progressive attentional manifold alignment (PAMA) to align the content manifold to the style manifold. This module consists of a channel alignment module to emphasize related content and style semantics, an attention module to establish the correspondence between features, and a spatial interpolation module to adaptively align the manifolds. The proposed PAMA can alleviate the style degradation problem and produce state-of-the-art stylization results.
PDF CODE
Computer Science Diagram Understanding with Topology Parsing
Shaowei Wang, Lingling Zhang, Xuan Luo, Yi Yang, Xin Hu, Tao Qin, Jun Liu
TKDD 2022
ABSTRACT
Diagram is a special form of visual expression for representing complex concepts, logic, and knowledge, which widely appears in educational scenes such as textbooks, blogs, and encyclopedias. Current research on diagrams preliminarily focuses on natural disciplines such as Biology and Geography, whose expressions are still similar to natural images. In this article, we construct the first novel geometric type of diagrams dataset in Computer Science field, which has more abstract expressions and complex logical relations. The dataset has exhaustive annotations of objects and relations for about 1,300 diagrams and 3,500 question-answer pairs. We introduce the tasks of diagram classification (DC) and diagram question answering (DQA) based on the new dataset, and propose the Diagram Paring Net (DPN) that focuses on analyzing the topological structure and text information of diagrams. We use DPN-based models to solve DC and DQA tasks, and compare the performances to well-known natural images classification models and visual question answering models. Our experiments show the effectiveness of the proposed DPN-based models on diagram understanding tasks, also indicate that our dataset is more complex compared to previous natural image understanding datasets. The presented dataset opens new challenges for research in diagram understanding, and the DPN method provides a novel perspective for studying such data. Our dataset can be available from https://github.com/WayneWong97/CSDia.
PDF CODE