🌐 Website Preview - Jing Peng

English | 中文

I am now a Zhiyuan Honor Ph.D. Student at Shanghai Jiao Tong University (SJTU), X-LANCE Lab, advised by Prof. Kai Yu (and co-advised by Prof. Shinji Watanabe).

My research focuses on Speech Large Language Models (Speech LLMs), with an emphasis on building well-aligned speech understanding systems that are robust to domain shift and multi-speaker conditions.


Research Interests


Publications (Selected)

* indicates equal contribution.

TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
Jing Peng*, C. Wang*, Y. Yang, L. Qian, J. Li, Y. Xi, S. Wang, K. Yu.
arXiv:2604.08384. Accepted by Interspeech 2026.
arXiv
Audio-Mind: An Auditable Agentic Framework for Audio Understanding
Y. Wang*, Jing Peng*, H. Li, C. Wang, W. Tu, Y. Xi, Z. Sun, K. Yu, S. Wang.
arXiv:2605.28480. Submitted to EMNLP 2026.
arXiv
G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Jing Peng*, Z. Chen*, H. Li*, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wang.
arXiv:2603.10468. Submitted to EMNLP 2026.
arXiv
TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR
Jing Peng*, Q. She*, Y. Fang, Y. Xi, K. Yu.
arXiv:2602.11546. Submitted to EMNLP 2026.
arXiv
A Unified and Reproducible Experimentation Framework for Speech Understanding
Jing Peng*, J. Du*, C. Wang*, H. Li*, Y. Yang*, Y. Wang, X. Gu, G. Chen, Y. Wang, J. Li, Z. Zhao, H. Wang, W. Tu, H. Li, D. Ma, L. Qian, Y. Xi, W. Wen, J. Guo, H. Zhang, S. Fan, W. Jiang, S. Wang, K. Yu.
arXiv:2605.30899. Accepted by Interspeech 2026.
arXiv
RAS: a Reliability Oriented Metric for Automatic Speech Recognition
W. Huang, Y. Qiu, B. Li, Y. Guo, Jing Peng, H. Wang, X. Chen, K. Yu.
arXiv:2604.24278. Accepted by Interspeech 2026.
arXiv
A Survey on Speech Large Language Models for Understanding
Jing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu.
arXiv:2410.18908. Accepted by IEEE JSTSP.
arXiv
TASU: Text-Only Alignment for Speech Understanding
Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu.
arXiv:2511.03310. Accepted by ICASSP 2026.
arXiv

Contact Information

English | 中文

我是 上海交通大学 (SJTU) X-LANCE Lab 的致远荣誉博士生,导师是 俞凯教授(联合导师是 Shinji Watanabe 教授)。

我的研究专注于语音大语言模型 (Speech LLMs),重点是构建对领域迁移多说话人场景具有鲁棒性的良好对齐的语音理解系统


研究兴趣


发表论文 (精选)

* 表示同等贡献。

TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
Jing Peng*, C. Wang*, Y. Yang, L. Qian, J. Li, Y. Xi, S. Wang, K. Yu.
arXiv:2604.08384. Accepted by Interspeech 2026.
arXiv
Audio-Mind: An Auditable Agentic Framework for Audio Understanding
Y. Wang*, Jing Peng*, H. Li, C. Wang, W. Tu, Y. Xi, Z. Sun, K. Yu, S. Wang.
arXiv:2605.28480. Submitted to EMNLP 2026.
arXiv
G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Jing Peng*, Z. Chen*, H. Li*, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wang.
arXiv:2603.10468. Submitted to EMNLP 2026.
arXiv
TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR
Jing Peng*, Q. She*, Y. Fang, Y. Xi, K. Yu.
arXiv:2602.11546. Submitted to EMNLP 2026.
arXiv
A Unified and Reproducible Experimentation Framework for Speech Understanding
Jing Peng*, J. Du*, C. Wang*, H. Li*, Y. Yang*, Y. Wang, X. Gu, G. Chen, Y. Wang, J. Li, Z. Zhao, H. Wang, W. Tu, H. Li, D. Ma, L. Qian, Y. Xi, W. Wen, J. Guo, H. Zhang, S. Fan, W. Jiang, S. Wang, K. Yu.
arXiv:2605.30899. Accepted by Interspeech 2026.
arXiv
A Survey on Speech Large Language Models for Understanding
Jing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu.
arXiv:2410.18908. Accepted by IEEE JSTSP.
arXiv
TASU: Text-Only Alignment for Speech Understanding
Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu.
arXiv:2511.03310. Accepted by ICASSP 2026.
arXiv

联系方式

Publications / 发表论文

* indicates equal contribution. / * 表示同等贡献。

🧠 Speech Large Language Models for Understanding

📊 Survey & Benchmark

A Unified and Reproducible Experimentation Framework for Speech Understanding
Jing Peng*, J. Du*, C. Wang*, H. Li*, Y. Yang*, Y. Wang, X. Gu, G. Chen, Y. Wang, J. Li, Z. Zhao, H. Wang, W. Tu, H. Li, D. Ma, L. Qian, Y. Xi, W. Wen, J. Guo, H. Zhang, S. Fan, W. Jiang, S. Wang, K. Yu
Interspeech 2026 (accepted), 2026
arXiv
A Survey on Speech Large Language Models for Understanding
Jing Peng*, Y. Wang*, Y. Fang, Y. Xi, X. Li, X. Zhang, K. Yu
IEEE JSTSP (accepted), 2024
arXiv
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
B. Li, W. Huang, Y. Qiu, Y. Guo, H. Wang, Z. Li, Jing Peng, Z. Ma, X. Chen, K. Yu
ICASSP 2026 (accepted), 2026
arXiv

🔗 Speech-Text Alignment

TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
Jing Peng*, C. Wang*, Y. Yang, L. Qian, J. Li, Y. Xi, S. Wang, K. Yu
Interspeech 2026 (accepted), 2026
arXiv
TASU: Text-Only Alignment for Speech Understanding
Jing Peng, Y. Yang, X. Li, Y. Xi, Q. Tang, Y. Fang, J. Li, K. Yu
ICASSP 2026 (accepted), 2026
arXiv
Low-Resource Domain Adaptation for Speech LLMs via Text-Only Fine-Tuning
Y. Fang*, Jing Peng*, X. Li, Y. Xi, C. Zhang, G. Zhong, K. Yu
ASRU 2025 (accepted), 2025
arXiv

🤖 Agentic Systems

Audio-Mind: An Auditable Agentic Framework for Audio Understanding
Y. Wang*, Jing Peng*, H. Li, C. Wang, W. Tu, Y. Xi, Z. Sun, K. Yu, S. Wang
submitted to EMNLP 2026, 2026
arXiv
VISA: A Visual Information Strengthened Audio-Reasoning System for the Interspeech 2026 ARC Agent Track
W. Tu, J. Gao, Y. Huo, Y. Wang, Jing Peng, B. Li, Z. Ma, T. Liu, S. Fan, K. Yu, X. Chen, Z. Zheng
Interspeech 2026 (accepted), 2026
arXiv
XFlow: An Executable Protocol Programming System for Reliable Multi-Agent Workflows
H. Li*, Jing Peng*, Z. Wang, L. Chen, K. Yu
arXiv preprint, 2026
arXiv
Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness
Z. Wang, H. Li, Z. Yang, Z. Hu, S. Zuo, Y. Zhang, D. Ma, D. Luo, C. Wang, Jing Peng, T. Huang, S. Guo, H. Wang, Z. Zhu, S. Han, Y. Cao, K. Yu, L. Chen
arXiv preprint, 2026
arXiv

🌍 Multilingual and Multispeaker

G-STAR: End-to-End Global Speaker-Tracking Attributed Recognition
Jing Peng*, Z. Chen*, H. Li*, Y. Wang, D. Ma, M. Li, Y. Du, D. Xu, K. Yu, S. Wang
submitted to EMNLP 2026, 2026
arXiv
MOSA: Mixtures of Simple Adapters Outperform Monolithic Approaches in LLM-based Multilingual ASR
Junjie Li, Jing Peng, Yangui Fang, Shuai Wang, Kai Yu
ICASSP 2026 (accepted), 2026
arXiv

🎙️ Automatic Speech Recognition (Traditional)

TC-BiMamba: Trans-Chunk bidirectionally within BiMamba for unified streaming and non-streaming ASR
Jing Peng*, Q. She*, Y. Fang, Y. Xi, K. Yu
submitted to EMNLP 2026, 2026
arXiv
Fewer Hallucinations, More Verification: A Three-Stage LLM-Based Framework for ASR Error Correction
Y. Fang, B. Cheng, Jing Peng, X. Li, Y. Xi, C. Zhang, G. Zhong
ASRU 2025 (accepted), 2025
arXiv
Joint Decoding Method for Controllable Contextual Speech Recognition Based on Speech LLM
Y. Fang*, J. Peng*, Y. Xi, X. Li, H. Li, C. Zhang, G. Zhong, K. Yu
arXiv preprint, 2025
arXiv
RAS: a Reliability Oriented Metric for Automatic Speech Recognition
W. Huang, Y. Qiu, B. Li, Y. Guo, Jing Peng, H. Wang, X. Chen, K. Yu
Interspeech 2026 (accepted), 2026
arXiv

📋 File Structure

PigeonDan1.github.io/
├── index.md              (English homepage with G-STAR added)
├── index-zh.md           (Chinese homepage)
├── about.md              (English about page)
├── about-zh.md           (Chinese about page)
├── resume.md             (English resume page)
├── resume-zh.md          (Chinese resume page)
├── publications.md       (English publications page)
├── publications-zh.md    (Chinese publications page)
├── _data/
│   ├── navigation.yml    (Added Chinese link)
│   └── publications.yml  (Added G-STAR paper)
└── preview.html          (This preview file)
        

🚀 How to Deploy

  1. Commit all changes:
    git add .
    git commit -m "Add G-STAR paper and Chinese language support"
    git push origin master
  2. GitHub Pages will automatically build and deploy your site (usually within 1-2 minutes)
  3. Visit https://pigeondan1.github.io to see the changes

📝 Summary of Changes