About me

English | 中文

Short Bio

I am Jing Peng (彭景), a first year Zhiyuan Honor Ph.D. Student at Shanghai Jiao Tong University (SJTU), X-LANCE Lab, advised by Prof. Kai Yu (and co-advised by Prof. Shinji Watanabe), closely collaborating with Prof. Shuai Wang.

My research focuses on Speech Large Language Models (Speech LLMs), with an emphasis on building well-aligned speech understanding systems that are robust to domain shift and multi-speaker conditions.

Basic Information

🎂
Date of Birth

2003-10-06

📍
Hometown

Hengyang, Hunan, China

🌐
Languages

Mandarin, English

✉️
💻
GitHub

PigeonDan1

📚
Google Scholar

Profile

🔬
Semantic Scholar

Profile

🔗
LinkedIn

Profile

Education

Sep 2025 – Present
Shanghai Jiao Tong University (SJTU)

Zhiyuan Honors Ph.D. Program, X-LANCE Lab, College of Computer Science

Ph.D. Student

Sep 2021 – Jun 2025
Xi'an Jiao Tong University (XJTU)

B.Eng. in Automation, minor in Electrical Engineering

Qian Xuesen Honor Class

Aug 2023 – Dec 2023
University of California, Berkeley

Exchange Student, Berkeley Global Access (BGA), College of Engineering

Research Interests

My research centers on building robust and practical speech understanding systems, spanning from foundational ASR to modern Speech Large Language Models.

🧠 Speech Large Language Models for Understanding
📊 Survey & Benchmark

Building reproducible experimentation frameworks and benchmarks to measure what speech understanding systems can and cannot do.

Representative: SURE ISA-Bench Survey
🔗 Speech-Text Alignment

Aligning speech representations with language models through controllable simulation and text-only adaptation techniques.

Representative: TASU TASU2
🤖 Agentic Systems

Equipping speech and audio systems with agentic reasoning, multi-modal evidence, and reliable multi-agent collaboration.

Representative: Audio-Mind VISA XFlow
🌍 Multilingual and Multispeaker

Tackling complex real-world scenarios with multiple speakers and multiple languages under unified frameworks.

Representative: G-STAR MOSA
🎙️ Automatic Speech Recognition (Traditional)

Alongside Speech LLM research, I continue to work on foundational ASR problems.

🎙️ Streaming & Non-streaming ASR

Unified architectures such as TC-BiMamba that bridge streaming and non-streaming recognition.

Representative: TC-BiMamba
✍️ ASR Error Correction & Controllability

LLM-based error correction and controllable contextual speech recognition.

Representative: Fewer Hallucinations Joint Decoding
📏 Reliability & Evaluation

Metrics like RAS that focus on the reliability of ASR outputs beyond simple word-error rates.

Representative: RAS

Research Experience

🎙️ Speech LLMs for Speech Understanding

AISpeech, Suzhou, Jiangsu
I work on ASR and multimodal alignment methods that connect speech representations with language model reasoning and instruction following.

🗣️ SA-ASR with Speech LLMs

Shenzhen Research Institute of Big Data, Remote
I explore Speech LLM-based frameworks for speaker-attributed transcription, aiming to improve speaker consistency and controllability in multi-speaker scenarios.

👥 Speaker Discrimination on Omni/SLM

Hi Lab, Xiaohongshu, Shanghai
I study explicit speaker discrimination and implicit speaker selection strategies for multi-speaker understanding, with an eye toward robust speaker identity modeling under real-world conditions.

Publications (Selected)

Audio-Mind: An Auditable Agentic Framework for Audio Understanding
Y. Wang*, Jing Peng*, H. Li, C. Wang, W. Tu, Y. Xi, Z. Sun, K. Yu, S. Wang
arXiv:2605.28480 · Submitted to EMNLP 2026
VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track
W. Tu, J. Gao, Y. Huo, Y. Wang, Jing Peng, B. Li, Z. Ma, T. Liu, S. Fan, K. Yu, X. Chen, Z. Zheng
arXiv:2606.07264v1 · Accepted by Interspeech 2026
XFlow: An Executable Protocol Programming System for Reliable Multi-Agent Workflows
H. Li*, Jing Peng*, Z. Wang, L. Chen, K. Yu
arXiv:2606.14790
TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR
Jing Peng*, Q. She*, Y. Fang, Y. Xi, K. Yu
arXiv:2602.11546 · Submitted to EMNLP 2026
TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
Jing Peng*, C. Wang*, Y. Yang, L. Qian, J. Li, Y. Xi, S. Wang, K. Yu
arXiv:2604.08384 · Accepted by Interspeech 2026
A Unified and Reproducible Experimentation Framework for Speech Understanding
Jing Peng*, J. Du*, C. Wang*, H. Li*, Y. Yang*, et al.
arXiv:2605.30899 · Accepted by Interspeech 2026
RAS: a Reliability Oriented Metric for Automatic Speech Recognition
W. Huang, Y. Qiu, B. Li, Y. Guo, Jing Peng, H. Wang, X. Chen, K. Yu
arXiv:2604.24278 · Accepted by Interspeech 2026
G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Jing Peng*, Z. Chen*, H. Li*, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wang
arXiv:2603.10468 · Submitted to EMNLP 2026
A Survey on Speech Large Language Models for Understanding
Jing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu
arXiv:2410.18908 · Accepted by IEEE JSTSP
TASU: Text-Only Alignment for Speech Understanding
Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu
arXiv:2511.03310 · Accepted by ICASSP 2026
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Y. Fang*, Jing Peng*, X. Li, Y. Xi, C. Zhang, G. Zhong, K. Yu
arXiv:2506.05671 · Accepted by ASRU 2025
MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu
arXiv:2508.18998 · Accepted by ICASSP 2026
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Y. Fang, B. Cheng, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong
arXiv:2505.24347 · Accepted by ASRU 2025
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
B. Li, W. Huang, Y. Qiu, Y. Guo, H. Wang, Z. Li, Jing Peng, Z. Ma, X. Chen, K. Yu
arXiv:2510.23558 · Accepted by ICASSP 2026

Resume

📄 View or download my resume

Personal Information

I am originally from Hengyang, Hunan, China, a beautiful city. Outside of research, I enjoy exploring local food and traveling, and I am also an active fan of sports such as badminton and basketball.

If you are interested in my research directions, feel free to reach out—I am happy to discuss potential collaborations.