About me

English | 中文

Short Bio

I am Jing Peng (彭景), a first year Zhiyuan Honor Ph.D. Student at Shanghai Jiao Tong University (SJTU), X-LANCE Lab, advised by Prof. Kai Yu (and co-advised by Prof. Shinji Watanabe), closely collaborating with Prof. Shuai Wang.

My research focuses on Speech Large Language Models (Speech LLMs), with an emphasis on building well-aligned speech understanding systems that are robust to domain shift and multi-speaker conditions.

Basic Information

🎂

Date of Birth

2003-10-06

📍

Hometown

Hengyang, Hunan, China

🌐

Languages

Mandarin, English

✉️

jing.peng@sjtu.edu.cn

💻

GitHub

PigeonDan1

📚

Google Scholar

Profile

🔬

Semantic Scholar

Profile

🔗

Profile

Education

Sep 2025 – Present

Shanghai Jiao Tong University (SJTU)

Zhiyuan Honors Ph.D. Program, X-LANCE Lab, College of Computer Science

Ph.D. Student

Sep 2021 – Jun 2025

Xi'an Jiao Tong University (XJTU)

B.Eng. in Automation, minor in Electrical Engineering

Qian Xuesen Honor Class

Aug 2023 – Dec 2023

University of California, Berkeley

Exchange Student, Berkeley Global Access (BGA), College of Engineering

Research Interests

My research centers on building robust and practical speech understanding systems, spanning from foundational ASR to modern Speech Large Language Models.

🧠 Speech Large Language Models for Understanding

📊 Survey & Benchmark

Building reproducible experimentation frameworks and benchmarks to measure what speech understanding systems can and cannot do.

Representative: SURE ISA-Bench Survey

🔗 Speech-Text Alignment

Aligning speech representations with language models through controllable simulation and text-only adaptation techniques.

Representative: TASU TASU2

🤖 Agentic Systems

Equipping speech and audio systems with agentic reasoning, multi-modal evidence, and reliable multi-agent collaboration.

Representative: Audio-Mind VISA XFlow

🌍 Multilingual and Multispeaker

Tackling complex real-world scenarios with multiple speakers and multiple languages under unified frameworks.

Representative: G-STAR MOSA

🎙️ Automatic Speech Recognition (Traditional)

Alongside Speech LLM research, I continue to work on foundational ASR problems.

🎙️ Streaming & Non-streaming ASR

Unified architectures such as TC-BiMamba that bridge streaming and non-streaming recognition.

Representative: TC-BiMamba

✍️ ASR Error Correction & Controllability

LLM-based error correction and controllable contextual speech recognition.

Representative: Fewer Hallucinations Joint Decoding

📏 Reliability & Evaluation

Metrics like RAS that focus on the reliability of ASR outputs beyond simple word-error rates.

Representative: RAS

Research Experience

🎙️ Speech LLMs for Speech Understanding

AISpeech, Suzhou, Jiangsu
I work on ASR and multimodal alignment methods that connect speech representations with language model reasoning and instruction following.

🗣️ SA-ASR with Speech LLMs

Shenzhen Research Institute of Big Data, Remote
I explore Speech LLM-based frameworks for speaker-attributed transcription, aiming to improve speaker consistency and controllability in multi-speaker scenarios.

👥 Speaker Discrimination on Omni/SLM

Hi Lab, Xiaohongshu, Shanghai
I study explicit speaker discrimination and implicit speaker selection strategies for multi-speaker understanding, with an eye toward robust speaker identity modeling under real-world conditions.

Publications (Selected)

indicates equal contribution. See the full list →

Audio-Mind: An Auditable Agentic Framework for Audio Understanding

Y. Wang*, Jing Peng*, H. Li, C. Wang, W. Tu, Y. Xi, Z. Sun, K. Yu, S. Wang

arXiv:2605.28480 · Submitted to EMNLP 2026

VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track

W. Tu, J. Gao, Y. Huo, Y. Wang, Jing Peng, B. Li, Z. Ma, T. Liu, S. Fan, K. Yu, X. Chen, Z. Zheng

arXiv:2606.07264v1 · Accepted by Interspeech 2026

XFlow: An Executable Protocol Programming System for Reliable Multi-Agent Workflows

H. Li*, Jing Peng*, Z. Wang, L. Chen, K. Yu

arXiv:2606.14790

TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR

Jing Peng*, Q. She*, Y. Fang, Y. Xi, K. Yu

arXiv:2602.11546 · Submitted to EMNLP 2026

TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs

Jing Peng*, C. Wang*, Y. Yang, L. Qian, J. Li, Y. Xi, S. Wang, K. Yu

arXiv:2604.08384 · Accepted by Interspeech 2026

A Unified and Reproducible Experimentation Framework for Speech Understanding

Jing Peng*, J. Du*, C. Wang*, H. Li*, Y. Yang*, et al.

arXiv:2605.30899 · Accepted by Interspeech 2026

RAS: a Reliability Oriented Metric for Automatic Speech Recognition

W. Huang, Y. Qiu, B. Li, Y. Guo, Jing Peng, H. Wang, X. Chen, K. Yu

arXiv:2604.24278 · Accepted by Interspeech 2026

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition

Jing Peng*, Z. Chen*, H. Li*, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wang

arXiv:2603.10468 · Submitted to EMNLP 2026

A Survey on Speech Large Language Models for Understanding

Jing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu

arXiv:2410.18908 · Accepted by IEEE JSTSP

TASU: Text-Only Alignment for Speech Understanding

Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu

arXiv:2511.03310 · Accepted by ICASSP 2026

Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning

Y. Fang*, Jing Peng*, X. Li, Y. Xi, C. Zhang, G. Zhong, K. Yu

arXiv:2506.05671 · Accepted by ASRU 2025

MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR

Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu

arXiv:2508.18998 · Accepted by ICASSP 2026

Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction

Y. Fang, B. Cheng, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong

arXiv:2505.24347 · Accepted by ASRU 2025

ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models

B. Li, W. Huang, Y. Qiu, Y. Guo, H. Wang, Z. Li, Jing Peng, Z. Ma, X. Chen, K. Yu

arXiv:2510.23558 · Accepted by ICASSP 2026

Resume

📄 View or download my resume

Personal Information

I am originally from Hengyang, Hunan, China, a beautiful city. Outside of research, I enjoy exploring local food and traveling, and I am also an active fan of sports such as badminton and basketball.

If you are interested in my research directions, feel free to reach out—I am happy to discuss potential collaborations.