Publications
* indicates equal contribution.
Speech Large Language Models for Understanding
Survey & Benchmark
-
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language ModelsB. Li, W. Huang, Y. Qiu, Y. Guo, H. Wang, Z. Li, Jing Peng, Z. Ma, X. Chen, K. YuICASSP 2026 (accepted), 2026
-
A Unified and Reproducible Experimentation Framework for Speech UnderstandingJing Peng*, J. Du*, C. Wang*, H. Li*, Y. Yang*, Y. Wang, X. Gu, G. Chen, Y. Wang, J. Li, Z. Zhao, H. Wang, W. Tu, H. Li, D. Ma, L. Qian, Y. Xi, W. Wen, J. Guo, H. Zhang, S. Fan, W. Jiang, S. Wang, K. YuInterspeech 2026 (accepted), 2026
-
A Survey on Speech Large Language Models for UnderstandingJing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. YuIEEE JSTSP (accepted), 2024
Speech-Text Alignment
-
TASU: Text-Only Alignment for Speech UnderstandingJing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. YuICASSP 2026 (accepted), 2026
-
TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMsJing Peng*, C. Wang*, Y. Yang, L. Qian, J. Li, Y. Xi, S. Wang, K. YuInterspeech 2026 (accepted), 2026
-
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-TuningY. Fang*, Jing Peng*, X. Li, Y. Xi, C. Zhang, G. Zhong, K. YuASRU 2025 (accepted), 2025
Agentic Systems
-
XFlow: An Executable Protocol Programming System for Reliable Multi-Agent WorkflowsH. Li*, Jing Peng*, Z. Wang, L. Chen, K. YuarXiv preprint, 2026
-
VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent TrackW. Tu, J. Gao, Y. Huo, Y. Wang, Jing Peng, B. Li, Z. Ma, T. Liu, S. Fan, K. Yu, X. Chen, Z. ZhengInterspeech 2026 (accepted), 2026
-
Audio-Mind: An Auditable Agentic Framework for Audio UnderstandingY. Wang*, Jing Peng*, H. Li, C. Wang, W. Tu, Y. Xi, Z. Sun, K. Yu, S. Wangsubmitted to EMNLP 2026, 2026
Multilingual and Multispeaker
-
MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASRJunjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai YuICASSP 2026 (accepted), 2026
-
G-STAR: End-to-End Global Speaker-Tracking Attributed RecognitionJing Peng*, Z. Chen*, H. Li*, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wangsubmitted to EMNLP 2026, 2026
Automatic Speech Recognition (Traditional)
-
RAS: a Reliability Oriented Metric for Automatic Speech RecognitionW. Huang, Y. Qiu, B. Li, Y. Guo, Jing Peng, H. Wang, X. Chen, K. YuInterspeech 2026 (accepted), 2026
-
TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASRJing Peng*, Q. She*, Y. Fang, Y. Xi, K. Yusubmitted to EMNLP 2026, 2026
-
Joint Decoding Method for Controllable Contextual Speech Recognition Based on Speech LLMY. Fang*, J. Peng*, Y. Xi, X. Li, H. Li, C. Zhang, G. Zhong, K. YuarXiv preprint, 2025
-
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error CorrectionY. Fang, B. Cheng, Jing Peng, X. Li, Y. Xi, C. Zhang, G. ZhongASRU 2025 (accepted), 2025