About me
Short Bio
I am Jing Peng (彭景), a first year Zhiyuan Honor Ph.D. Student at Shanghai Jiao Tong University (SJTU), X-LANCE Lab, advised by Prof. Kai Yu (and co-advised by Prof. Shinji Watanabe).
My research focuses on Speech Large Language Models (Speech LLMs), with an emphasis on building well-aligned speech understanding systems that are robust to domain shift and multi-speaker conditions.
Basic Information
- Date of Birth: 2003-10-06
- Hometown: Hengyang, Hunan, China
- Languages: Mandarin, English
- Email: jing.peng@sjtu.edu.cn
- GitHub: https://github.com/PigeonDan1
- Google Scholar: https://scholar.google.com/citations?user=Uo0mj0AAAAAJ&hl=en
- LinkedIn: https://www.linkedin.com/in/jing-peng-7ab8682a4/
Education
-
Shanghai Jiao Tong University (SJTU), Shanghai, China
Zhiyuan Honors Ph.D. Program, X-LANCE Lab, College of Computer Science
Ph.D. Student, Sep 2025 – Present -
Xi’an Jiao Tong University (XJTU), Xi’an, China
B.Eng. in Automation, minor in Electrical Engineering
Qian Xuesen Honor Class
Sep 2021 – Jun 2025 -
University of California, Berkeley, Berkeley, USA
Exchange Student, Berkeley Global Access (BGA), College of Engineering
Aug 2023 – Dec 2023
Research Interests
Generally, I am focusing on Speech Large Language Models (Speech LLMs) for speech understanding and reasoning:
- Speaker-attributed ASR (SA-ASR) and multi-speaker understanding
- Multimodal alignment between speech and text for instruction-following speech systems
- Efficient adaptation (text-only fine-tuning, lightweight adapters) for low-resource / cross-domain settings
Research Experience
My recent work spans both academic labs and industry research:
-
Speech LLMs for Speech Understanding (AISpeech, Suzhou, Jiangsu)
I work on ASR and multimodal alignment methods that connect speech representations with language model reasoning and instruction following. -
SA-ASR with Speech LLMs (Shenzhen Research Institute of Big Data, Remote)
I explore Speech LLM-based frameworks for speaker-attributed transcription, aiming to improve speaker consistency and controllability in multi-speaker scenarios. -
Speaker Discrimination on Omni/SLM (Hi Lab, Xiaohongshu, Shanghai)
I study explicit speaker discrimination and implicit speaker selection strategies for multi-speaker understanding, with an eye toward robust speaker identity modeling under real-world conditions.
Publications (Selected)
- indicates equal contribution.
-
A Survey on Speech Large Language Models for Understanding
Jing Peng, Y. Wang, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu.
arXiv:2410.18908. Accepted by IEEE JSTSP.
https://arxiv.org/abs/2410.18908 -
TASU: Text-Only Alignment for Speech Understanding
Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu.
Accepted by ICASSP 2026. -
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Y. Fang, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong, K. Yu.
arXiv:2506.05671. Accepted by ASRU 2025.
https://arxiv.org/abs/2506.05671 -
MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu.
arXiv:2508.18998. Accepted by ICASSP 2026.
https://arxiv.org/abs/2508.18998 -
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Y. Fang, B. Cheng, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong.
arXiv:2505.24347. Accepted by ASRU 2025.
https://arxiv.org/abs/2505.24347 -
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
B. Li, W. Huang, Y. Qiu, Y. Guo, H. Wang, Z. Li, Jing Peng, Z. Ma, X. Chen, K. Yu.
arXiv:2510.23558. Accepted by ICASSP 2026.
https://arxiv.org/abs/2510.23558
Resume
Resume page: View or download my resume
Personal Information
I am originally from Hengyang, Hunan, China, a beautiful city. Outside of research, I enjoy exploring local food and traveling, and I am also an active fan of sports such as badminton and basketball.
If you are interested in my research directions, feel free to reach out—I am happy to discuss potential collaborations.