About me

Short Bio

I am Jing Peng (彭景), a first year Zhiyuan Honor Ph.D. Student at Shanghai Jiao Tong University (SJTU), X-LANCE Lab, advised by Prof. Kai Yu (and co-advised by Prof. Shinji Watanabe).

My research focuses on Speech Large Language Models (Speech LLMs), with an emphasis on building well-aligned speech understanding systems that are robust to domain shift and multi-speaker conditions.

Basic Information

Date of Birth: 2003-10-06
Hometown: Hengyang, Hunan, China
Languages: Mandarin, English
Email: jing.peng@sjtu.edu.cn
GitHub: https://github.com/PigeonDan1
Google Scholar: https://scholar.google.com/citations?user=Uo0mj0AAAAAJ&hl=en
LinkedIn: https://www.linkedin.com/in/jing-peng-7ab8682a4/

Education

Shanghai Jiao Tong University (SJTU), Shanghai, China
Zhiyuan Honors Ph.D. Program, X-LANCE Lab, College of Computer Science
Ph.D. Student, Sep 2025 – Present
Xi’an Jiao Tong University (XJTU), Xi’an, China
B.Eng. in Automation, minor in Electrical Engineering
Qian Xuesen Honor Class
Sep 2021 – Jun 2025
University of California, Berkeley, Berkeley, USA
Exchange Student, Berkeley Global Access (BGA), College of Engineering
Aug 2023 – Dec 2023

Research Interests

Generally, I am focusing on Speech Large Language Models (Speech LLMs) for speech understanding and reasoning:

Speaker-attributed ASR (SA-ASR) and multi-speaker understanding
Multimodal alignment between speech and text for instruction-following speech systems
Efficient adaptation (text-only fine-tuning, lightweight adapters) for low-resource / cross-domain settings

Research Experience

My recent work spans both academic labs and industry research:

Speech LLMs for Speech Understanding (AISpeech, Suzhou, Jiangsu)
I work on ASR and multimodal alignment methods that connect speech representations with language model reasoning and instruction following.
SA-ASR with Speech LLMs (Shenzhen Research Institute of Big Data, Remote)
I explore Speech LLM-based frameworks for speaker-attributed transcription, aiming to improve speaker consistency and controllability in multi-speaker scenarios.
Speaker Discrimination on Omni/SLM (Hi Lab, Xiaohongshu, Shanghai)
I study explicit speaker discrimination and implicit speaker selection strategies for multi-speaker understanding, with an eye toward robust speaker identity modeling under real-world conditions.

Publications (Selected)

indicates equal contribution.

A Survey on Speech Large Language Models for Understanding
Jing Peng, Y. Wang, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu.
arXiv:2410.18908. Accepted by IEEE JSTSP.
https://arxiv.org/abs/2410.18908
TASU: Text-Only Alignment for Speech Understanding
Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu.
Accepted by ICASSP 2026.
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Y. Fang, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong, K. Yu.
arXiv:2506.05671. Accepted by ASRU 2025.
https://arxiv.org/abs/2506.05671
MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu.
arXiv:2508.18998. Accepted by ICASSP 2026.
https://arxiv.org/abs/2508.18998
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Y. Fang, B. Cheng, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong.
arXiv:2505.24347. Accepted by ASRU 2025.
https://arxiv.org/abs/2505.24347
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
B. Li, W. Huang, Y. Qiu, Y. Guo, H. Wang, Z. Li, Jing Peng, Z. Ma, X. Chen, K. Yu.
arXiv:2510.23558. Accepted by ICASSP 2026.
https://arxiv.org/abs/2510.23558

Resume

Resume page: View or download my resume

Personal Information

I am originally from Hengyang, Hunan, China, a beautiful city. Outside of research, I enjoy exploring local food and traveling, and I am also an active fan of sports such as badminton and basketball.

If you are interested in my research directions, feel free to reach out—I am happy to discuss potential collaborations.