News 🎉
- TASU2
- SURE
- RAS
- VISA (Agent Track)
- A Survey on Speech Large Language Models for Understanding
- TASU — Oral
- MOSA — Poster
- ISA-Bench — Oral
- Low-Resource Domain Adaptation
- Fewer Hallucinations, More Verification
I am now a Zhiyuan Honor Ph.D. Student at Shanghai Jiao Tong University (SJTU), X-LANCE Lab, advised by Prof. Kai Yu (and co-advised by Prof. Shinji Watanabe), closely collaborating with Prof. Shuai Wang.
My research focuses on Speech Large Language Models (Speech LLMs), with an emphasis on building well-aligned speech understanding systems that are robust to domain shift and multi-speaker conditions.
Research Interests
My research centers on building robust and practical speech understanding systems, spanning from foundational ASR to modern Speech Large Language Models.
Building reproducible experimentation frameworks and benchmarks to measure what speech understanding systems can and cannot do.
Aligning speech representations with language models through controllable simulation and text-only adaptation techniques.
Equipping speech and audio systems with agentic reasoning, multi-modal evidence, and reliable multi-agent collaboration.
Tackling complex real-world scenarios with multiple speakers and multiple languages under unified frameworks.
Alongside Speech LLM research, I continue to work on foundational ASR problems.
Unified architectures such as TC-BiMamba that bridge streaming and non-streaming recognition.
LLM-based error correction and controllable contextual speech recognition.
Metrics like RAS that focus on the reliability of ASR outputs beyond simple word-error rates.
Research Experience
AISpeech, Suzhou, Jiangsu
I work on ASR and multimodal alignment methods that connect speech representations with language model reasoning and instruction following.
Shenzhen Research Institute of Big Data, Remote
I explore Speech LLM-based frameworks for speaker-attributed transcription, aiming to improve speaker consistency and controllability in multi-speaker scenarios.
Hi Lab, Xiaohongshu, Shanghai
I study explicit speaker discrimination and implicit speaker selection strategies for multi-speaker understanding, with an eye toward robust speaker identity modeling under real-world conditions.
Publications (Selected)
- indicates equal contribution. See the full list →
Contact Information
I am so happy to chat and collaborate on the topics above and you can contact me by:
- Email: jing.peng@sjtu.edu.cn
- GitHub: https://github.com/PigeonDan1
- Google Scholar: https://scholar.google.com/citations?user=Uo0mj0AAAAAJ&hl=en
- Semantic Scholar: https://www.semanticscholar.org/author/Jing-Peng/2327961941
- LinkedIn: https://www.linkedin.com/in/jing-peng-7ab8682a4/