Rongjie Huang (黄融杰) is at Seamless Team at FAIR. Previously, I graduated from College of Computer Science, Zhejiang University, supervised by Prof. Zhou Zhao. I also obtained Bachelor’s degree at Zhejiang University. My research interest includes Multi-modal Large Language Model, Video-Audio Generative Models, and Audio-Visual Language Processing. I have published first-author papers at the top international AI conferences such as NeurIPS/ICLR/ICML/ACL/IJCAI. I developed a few well-known Speech/NLP algorithms including:

Multimodal LLMs: Seamless-Interaction (LLama4+Dyadic Motion Diffusion), AudioGPT, UniAudio, Make-A-Voice
Omini-modal Audio Generative Models: Lumina-T2X (omini-modal), Make-An-Audio, Make-An-Audio-2, FastDiff, GenerSpeech
Multimodal Translation: TranSpeech, and AV-TranSpeech

🔥 News

2025.04: I am awarded the Best Thesis Award by the Electrical Engineering Association!
2025.02: 4 papers are accepted by ICLR 2025!
2025.01: 1 paper is accepted by AAAI 2025!
2024.10: 6 papers are accepted by NeurIPS 2024!
2024.05: 6 papers are accepted by ACL 2024! (main conference and findings)!
2024.05: 3 papers are accepted by ICML 2024!
2024.03: 1 paper is accepted by NAACL 2024 main conference!
2024.01: 1 paper is accepted by ICLR 2024!
2023.11: 2 papers are accepted by AAAI 2024 main / AAAI 2024 demo!
2023.10: I am awarded ByteDance Scholar Fellowship, and Chu Kochen Presidential Scholarship!
2023.10: UniAudio released!
2023.09: One paper is accepted by EMNLP 2023!
2023.07: One paper is accepted by ACM-MM 2023!
2023.06: One paper is accepted by ICCV 2023!
2023.05: 8 papers are accepted by ACL 2023 (main conference and findings)! Thanks to my co-authors!
2023.04: AudioGPT and HiFi-Codec released!
2023.04: One papers is accepted by ICML 2023!
2023.02: Make-An-Audio released! Media coverage: Heart of Machine, ByteDance and Twitter
2023.01: One papers is accepted by ICLR 2023!
2022.09: Two papers are accepted by NeurIPS 2022!