I am now a Zhiyuan Honor Ph.D. Student at Shanghai Jiao Tong University (SJTU), X-LANCE Lab, advised by Prof. Kai Yu (and co-advised by Prof. Shinji Watanabe).
My research focuses on Speech Large Language Models (Speech LLMs), with an emphasis on building well-aligned speech understanding systems that are robust to domain shift and multi-speaker conditions.
Research Interests
- Multimodal alignment between speech and text for instruction-following speech systems
- Efficient adaptation for low-resource / cross-domain settings
- Speaker-attributed ASR (SA-ASR) and multi-speaker understanding
Publications (Selected)
* indicates equal contribution.
G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Jing Peng, Z. Chen, H. Li, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wang.
arXiv:2603.10468.
Submitted to Interspeech 2026.
arXiv
A Survey on Speech Large Language Models for Understanding
Jing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu.
arXiv:2410.18908.
Accepted by IEEE JSTSP.
arXiv
TASU: Text-Only Alignment for Speech Understanding
Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu.
arXiv:2511.03310.
Accepted by ICASSP 2026.
arXiv
Contact Information
我是 上海交通大学 (SJTU) X-LANCE Lab 的致远荣誉博士生,导师是 俞凯教授(联合导师是 Shinji Watanabe 教授)。
我的研究专注于语音大语言模型 (Speech LLMs),重点是构建对领域迁移和多说话人场景具有鲁棒性的良好对齐的语音理解系统。
研究兴趣
- 语音和文本之间的多模态对齐,用于指令跟随语音系统
- 低资源/跨领域场景的高效自适应
- 说话人归属 ASR (SA-ASR) 和多说话人理解
发表论文 (精选)
* 表示同等贡献。
G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Jing Peng, Z. Chen, H. Li, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wang.
arXiv:2603.10468.
Submitted to Interspeech 2026.
arXiv
A Survey on Speech Large Language Models for Understanding
Jing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu.
arXiv:2410.18908.
Accepted by IEEE JSTSP.
arXiv
TASU: Text-Only Alignment for Speech Understanding
Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu.
arXiv:2511.03310.
Accepted by ICASSP 2026.
arXiv
联系方式
Publications / 发表论文
* indicates equal contribution. / * 表示同等贡献。
📂 Multi-Speaker Speech Understanding
G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Jing Peng, Z. Chen, H. Li, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wang
submitted to Interspeech 2026, 2026
arXiv
📂 Speech LLM Survey & Benchmark
A Survey on Speech Large Language Models for Understanding
Jing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu
IEEE JSTSP (accepted), 2024
arXiv
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
B. Li, W. Huang, Y. Qiu, Y. Guo, H. Wang, Z. Li, Jing Peng, Z. Ma, X. Chen, K. Yu
ICASSP 2026 (accepted), 2026
arXiv
📂 Speech LLM Alignment
TASU: Text-Only Alignment for Speech Understanding
Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu
ICASSP 2026 (accepted), 2026
📂 Speech LLM Domain Adaptation
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Y. Fang*, Jing Peng*, X. Li, Y. Xi, C. Zhang, G. Zhong, K. Yu
ASRU 2025 (accepted), 2025
arXiv
📂 Speech LLM Modular Adaptation
MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu
ICASSP 2026 (accepted), 2026
arXiv