Bio

Pengfei Zhang is a PHD student in UC Irvine, with a strong research focus on intelligent systems at the intersection of large language models (LLMs), speech, and vision. Most of his work centers around building multimodal agent frameworks that unify language, audio, and visual modalities to enhance decision-making in human-centric tasks. His projects span LLM-enhanced automatic speech recognition, multimodal alignment of LLMs for speech and visual understanding and generation, and agent-based personalized healthcare recommendation systems. His resume is presented in Pengfei’s CV.


🎇 NEWS

06/2025: 👏 One paper, [ContextualGesture], is accepted by ACM MM 2025!
06/2025: 👏 One paper, [KinMo], is accepted by ICCV 2025!
06/2025: 👊 Start an Applied Scientist Intern position at Amazon Web Service AI Lab!
06/2025: One paper, [Adaptive LLM Retrieval], is accepted by AMIA 2025!
12/2024: 👏 One paper, [DEMENTIA-PLAN], is accepted by AAAI W 2025!
06/2024: 👊 Start a research scientist intern position at Flawless. Inc!
10/2023: One paper, [Handformer2T], is accepted by WACV 2024!

👀 Research

Multimodal Alignment of LLMs for Speech and Visual Understanding and Generation

Jan 2023 - Present  
Multimodal alignment of LLMs for speech and visual understanding and generation aims to bridge language, audio, and vision into a unified semantic space. By synchronizing temporal and contextual cues across modalities—such as aligning speech with co-speech gestures or visual actions—LLMs can generate coherent, context-aware outputs that reflect human-like perception and behavior. This enables applications like gesture generation, motion synthesis, and audiovisual narration grounded in natural language. Multimodal
Publications:  
C2.3 KinMo: Kinematic-aware Human Motion Understanding and Generation.
Pengfei Zhang, Pinxin Liu, Pablo Garrido, Hyeongwoo Kim, Bindita Chaudhuri.
ICCV 2025
[project] [preprint] [demo]
C2.2 Contextual Gesture: Co-Speech Gesture Video Generation through Semantic-aware Gesture Representation.
Pinxin Liu, Pengfei Zhang, Hyeongwoo Kim, Pablo Garrido, Ari Shapiro, Kyle Olszewski.
ACM MM 2025
[project] [preprint]
C2.1 Handformer2T: A Lightweight Regression-based model for Interacting Hands Pose Estimation from a single RGB Image.
Pengfei Zhang, Deying Kong.
WACV 2024
[paper]

LLM-enhanced Automatic Speech Recognition

Jan 2023 - Present  
LLM-enhanced automatic speech recognition (ASR) integrates large language models with traditional ASR pipelines to improve transcription accuracy, particularly in domain-specific and noisy scenarios. By leveraging external knowledge sources—such as medical knowledge graphs—and contextual understanding, LLMs can correct recognition errors, disambiguate terms, and enhance spoken question answering. This hybrid approach enables more robust and semantically informed speech understanding in complex applications like healthcare. ASR
Publications:  
C1.2 MedSpeak: Knowledge Enhanced ASR Error Correction framework for Spoken Medical Question Answering
C1.1 Contextual Gesture: Co-Speech Gesture Video Generation through Semantic-aware Gesture Representation.
Pinxin Liu, Pengfei Zhang, Hyeongwoo Kim, Pablo Garrido, Ari Shapiro, Kyle Olszewski.
ACM MM 2025
[project] [preprint]

Agent-based Personalized Healthcare Recommendation Systems

Sep 2021 - Present  
Agent-based personalized healthcare recommendation systems leverage autonomous software agents to analyze individual patient data—such as medical history, lifestyle, and preferences—to deliver tailored health advice and treatment options. These systems can interact dynamically with users and other agents (e.g., diagnostic or monitoring tools), adapting recommendations in real time to improve decision-making and patient outcomes. Agent
Publications:  
C3.4 Adaptive Constraint Relaxation in Personalized Nutrition Recommendations: An LLM-Driven Knowledge Graph Retrieval Approach
Pengfei Zhang, Mohbat Fnu, Yutong Song, Oshani Seneviratne, Zhongqi Yang, Iman Azimi, Amir M. Rahmani
AMIA - American Medical Informatics Association
[preprint]
C3.3 DEMENTIA-PLAN: An Agent-Based Framework for Multi-Knowledge Graph Retrieval-Augmented Generation in Dementia Care.
Yutong Song, Chenhan Lyu, Pengfei Zhang, Sabine Brunswicker, Nikil Dutt, Amir M. Rahmani.
AAAI 2025 Workshop: Knowledge Graphs for Health Equity, Justice, and Social Services
[preprint]
C3.2 Knowledge-Infused LLM-Powered Conversational Health Agent: A Case Study for Diabetes Patients.
Mahyar Abbasian, Zhongqi Yang, Elahe Khatibi, Pengfei Zhang, Nitish Nagesh, Iman Azimi, Ramesh Jain, Amir M. Rahmani.
EMBC 2024
[paper] [code] [website]
C3.1 CLMB: deep contrastive learning for robust metagenomic binning
Pengfei Zhang, Zhengyuan Jiang, Yixuan Wang, Yu Li.
RECOMB 2022 (oral)
[paper] [code] [blog]

💪 Internships

2025 Applied Scientist Intern at Amazon Web Service AI Lab. Location: Santa Clara

2024 Research Science Intern at Flawless. AI. Inc. Location: Los Angeles

2022 Research Intern in the Chinese University of Hong Kong. Location: Hong Kong

📽️ Projects

Personal Software Programming Projects  
[Distributed Chatroom with LLaMa-Powered Summarization] A Multi-topic Web Chatroom which can provide backup on previous conversations and LLaMa Powered summarizations after each refresh

🎖️ Awards

Dean’s award from UCI

National Encouragement Scholarship (top 20%) from USTC

National Encouragement Scholarship (top 20%) from USTC