About me

Short Bio

I am Jing Peng (彭景), a first year Zhiyuan Honor Ph.D. Student at Shanghai Jiao Tong University (SJTU), X-LANCE Lab, advised by Prof. Kai Yu (and co-advised by Prof. Shinji Watanabe).

My research focuses on Speech Large Language Models (Speech LLMs), with an emphasis on building well-aligned speech understanding systems that are robust to domain shift and multi-speaker conditions.


Basic Information


Education

  • Shanghai Jiao Tong University (SJTU), Shanghai, China
    Zhiyuan Honors Ph.D. Program, X-LANCE Lab, College of Computer Science
    Ph.D. Student, Sep 2025 – Present

  • Xi’an Jiao Tong University (XJTU), Xi’an, China
    B.Eng. in Automation, minor in Electrical Engineering
    Qian Xuesen Honor Class
    Sep 2021 – Jun 2025

  • University of California, Berkeley, Berkeley, USA
    Exchange Student, Berkeley Global Access (BGA), College of Engineering
    Aug 2023 – Dec 2023


Research Interests

Generally, I am focusing on Speech Large Language Models (Speech LLMs) for speech understanding and reasoning:

  • Speaker-attributed ASR (SA-ASR) and multi-speaker understanding
  • Multimodal alignment between speech and text for instruction-following speech systems
  • Efficient adaptation (text-only fine-tuning, lightweight adapters) for low-resource / cross-domain settings

Research Experience

My recent work spans both academic labs and industry research:

  • Speech LLMs for Speech Understanding (AISpeech, Suzhou, Jiangsu)
    I work on ASR and multimodal alignment methods that connect speech representations with language model reasoning and instruction following.

  • SA-ASR with Speech LLMs (Shenzhen Research Institute of Big Data, Remote)
    I explore Speech LLM-based frameworks for speaker-attributed transcription, aiming to improve speaker consistency and controllability in multi-speaker scenarios.

  • Speaker Discrimination on Omni/SLM (Hi Lab, Xiaohongshu, Shanghai)
    I study explicit speaker discrimination and implicit speaker selection strategies for multi-speaker understanding, with an eye toward robust speaker identity modeling under real-world conditions.


Publications (Selected)

  • indicates equal contribution.
  • A Survey on Speech Large Language Models for Understanding
    Jing Peng, Y. Wang, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu.
    arXiv:2410.18908. Accepted by IEEE JSTSP.
    https://arxiv.org/abs/2410.18908

  • TASU: Text-Only Alignment for Speech Understanding
    Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu.
    Accepted by ICASSP 2026.

  • Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
    Y. Fang, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong, K. Yu.
    arXiv:2506.05671. Accepted by ASRU 2025.
    https://arxiv.org/abs/2506.05671

  • MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
    Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu.
    arXiv:2508.18998. Accepted by ICASSP 2026.
    https://arxiv.org/abs/2508.18998

  • Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
    Y. Fang, B. Cheng, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong.
    arXiv:2505.24347. Accepted by ASRU 2025.
    https://arxiv.org/abs/2505.24347

  • ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
    B. Li, W. Huang, Y. Qiu, Y. Guo, H. Wang, Z. Li, Jing Peng, Z. Ma, X. Chen, K. Yu.
    arXiv:2510.23558. Accepted by ICASSP 2026.
    https://arxiv.org/abs/2510.23558


Resume

Resume page: View or download my resume


Personal Information

I am originally from Hengyang, Hunan, China, a beautiful city. Outside of research, I enjoy exploring local food and traveling, and I am also an active fan of sports such as badminton and basketball.

If you are interested in my research directions, feel free to reach out—I am happy to discuss potential collaborations.