Publications

* indicates equal contribution.

Speech Large Language Models for Understanding

Survey & Benchmark

  • ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
    B. Li, W. Huang, Y. Qiu, Y. Guo, H. Wang, Z. Li, Jing Peng, Z. Ma, X. Chen, K. Yu
    ICASSP 2026 (accepted), 2026
  • A Unified and Reproducible Experimentation Framework for Speech Understanding
    Jing Peng*, J. Du*, C. Wang*, H. Li*, Y. Yang*, Y. Wang, X. Gu, G. Chen, Y. Wang, J. Li, Z. Zhao, H. Wang, W. Tu, H. Li, D. Ma, L. Qian, Y. Xi, W. Wen, J. Guo, H. Zhang, S. Fan, W. Jiang, S. Wang, K. Yu
    Interspeech 2026 (accepted), 2026
  • A Survey on Speech Large Language Models for Understanding
    Jing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu
    IEEE JSTSP (accepted), 2024

Speech-Text Alignment

  • TASU: Text-Only Alignment for Speech Understanding
    Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu
    ICASSP 2026 (accepted), 2026
  • TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
    Jing Peng*, C. Wang*, Y. Yang, L. Qian, J. Li, Y. Xi, S. Wang, K. Yu
    Interspeech 2026 (accepted), 2026
  • Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
    Y. Fang*, Jing Peng*, X. Li, Y. Xi, C. Zhang, G. Zhong, K. Yu
    ASRU 2025 (accepted), 2025

Agentic Systems

  • XFlow: An Executable Protocol Programming System for Reliable Multi-Agent Workflows
    H. Li*, Jing Peng*, Z. Wang, L. Chen, K. Yu
    arXiv preprint, 2026
  • VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track
    W. Tu, J. Gao, Y. Huo, Y. Wang, Jing Peng, B. Li, Z. Ma, T. Liu, S. Fan, K. Yu, X. Chen, Z. Zheng
    Interspeech 2026 (accepted), 2026
  • Audio-Mind: An Auditable Agentic Framework for Audio Understanding
    Y. Wang*, Jing Peng*, H. Li, C. Wang, W. Tu, Y. Xi, Z. Sun, K. Yu, S. Wang
    submitted to EMNLP 2026, 2026

Multilingual and Multispeaker

  • MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
    Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu
    ICASSP 2026 (accepted), 2026
  • G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
    Jing Peng*, Z. Chen*, H. Li*, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wang
    submitted to EMNLP 2026, 2026

Automatic Speech Recognition (Traditional)

  • RAS: a Reliability Oriented Metric for Automatic Speech Recognition
    W. Huang, Y. Qiu, B. Li, Y. Guo, Jing Peng, H. Wang, X. Chen, K. Yu
    Interspeech 2026 (accepted), 2026
  • TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR
    Jing Peng*, Q. She*, Y. Fang, Y. Xi, K. Yu
    submitted to EMNLP 2026, 2026
  • Joint Decoding Method for Controllable Contextual Speech Recognition Based on Speech LLM
    Y. Fang*, J. Peng*, Y. Xi, X. Li, H. Li, C. Zhang, G. Zhong, K. Yu
    arXiv preprint, 2025
  • Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
    Y. Fang, B. Cheng, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong
    ASRU 2025 (accepted), 2025