About me
Short Bio
My research focuses on Speech Large Language Models (Speech LLMs), with an emphasis on building well-aligned speech understanding systems that are robust to domain shift and multi-speaker conditions.
Basic Information
2003-10-06
Hengyang, Hunan, China
Mandarin, English
Education
Zhiyuan Honors Ph.D. Program, X-LANCE Lab, College of Computer Science
Ph.D. Student
B.Eng. in Automation, minor in Electrical Engineering
Qian Xuesen Honor Class
Exchange Student, Berkeley Global Access (BGA), College of Engineering
Research Interests
My research centers on building robust and practical speech understanding systems, spanning from foundational ASR to modern Speech Large Language Models.
Building reproducible experimentation frameworks and benchmarks to measure what speech understanding systems can and cannot do.
Aligning speech representations with language models through controllable simulation and text-only adaptation techniques.
Equipping speech and audio systems with agentic reasoning, multi-modal evidence, and reliable multi-agent collaboration.
Tackling complex real-world scenarios with multiple speakers and multiple languages under unified frameworks.
Alongside Speech LLM research, I continue to work on foundational ASR problems.
Unified architectures such as TC-BiMamba that bridge streaming and non-streaming recognition.
LLM-based error correction and controllable contextual speech recognition.
Metrics like RAS that focus on the reliability of ASR outputs beyond simple word-error rates.
Research Experience
AISpeech, Suzhou, Jiangsu
I work on ASR and multimodal alignment methods that connect speech representations with language model reasoning and instruction following.
Shenzhen Research Institute of Big Data, Remote
I explore Speech LLM-based frameworks for speaker-attributed transcription, aiming to improve speaker consistency and controllability in multi-speaker scenarios.
Hi Lab, Xiaohongshu, Shanghai
I study explicit speaker discrimination and implicit speaker selection strategies for multi-speaker understanding, with an eye toward robust speaker identity modeling under real-world conditions.
Publications (Selected)
- indicates equal contribution. See the full list →
Resume
Personal Information
If you are interested in my research directions, feel free to reach out—I am happy to discuss potential collaborations.