Rongjie Huang (黄融杰) is the final year’s graduate student at College of Computer Science and Software, Zhejiang University, supervised by Prof. Zhou Zhao. I also obtained Bachelor’s degree at Zhejiang University. During my graduate study, I was lucky to collaborate with the CMU Speech Team led by Prof. Shinji Watanabe, and Audio Research Team at Zhejiang University. I was grateful to intern or collaborate at TikTok, Shanghai AI Lab, Tencent Seattle Lab, Alibaba Qwen, with Yi Ren, Jinglin Liu, Chunlei Zhang and Dong Yu.

My research interest includes Multi-Modal Generative AI, Multi-Modal Language Processing, and AI4Science. I have published first-author papers at the top international AI conferences such as NeurIPS/ICLR/ICML/ACL/IJCAI. I developed a few well-known Speech/NLP algorithms including:

  • AudioGPT, UniAudio, Make-A-Voice: Multitask, Multilingual LLMs
  • Make-An-Audio, GenerSpeech: Zero-shot text-guided synthesis
  • FastDiff 1/2, ProDiff: AIGC diffusion models
  • TranSpeech, and AV-TranSpeech: Multimodal Translation

In 2024, I lead or participate in the following research topics:

  • Speech/NLP: multimodal generation and translation
  • Large Language Models (LLMs): Audio/Visual
  • Diffusion models: Image/Audio/3D

🔥 News

  • 2024.05: 6 papers are accepted by ACL 2024! (main conference and findings)! Thanks to my co-authors!
  • 2024.05: 3 papers are accepted by ICML 2024!
  • 2024.03: 1 paper is accepted by NAACL 2024 main conference!
  • 2024.01: 1 paper is accepted by ICLR 2024!
  • 2023.11: 2 papers are accepted by AAAI 2024 main / AAAI 2024 demo!
  • 2023.10: I am awarded ByteDance Scholar Fellowship, and Chu Kochen Presidential Scholarship!
  • 2023.10: UniAudio released!
  • 2023.09: One paper is accepted by EMNLP 2023!
  • 2023.07: One paper is accepted by ACM-MM 2023!
  • 2023.06: One paper is accepted by ICCV 2023!
  • 2023.05: 8 papers are accepted by ACL 2023 (main conference and findings)! Thanks to my co-authors!
  • 2023.04: AudioGPT and HiFi-Codec released!
  • 2023.04: One papers is accepted by ICML 2023!
  • 2023.02: Make-An-Audio released! Media coverage: Heart of Machine, ByteDance and Twitter
  • 2023.01: One papers is accepted by ICLR 2023!
  • 2022.09: Two papers are accepted by NeurIPS 2022!

📝 Representative Publications

Multi-modal Generative AI

  • Spoken Large Language Model: InstructSpeech (ICML 2024), UniAudio (ICML 2024), AudioGPT (AAAI demo 2024), Make-A-Voice (ACL 2024), HiFi-Codec
  • Text-to-Audio Synthesis: Make-An-Audio (ICML 2023)
  • Text-to-Speech Synthesis: GenerSpeech (NeurIPS 2022) for zero-shot text-to-speech, FastDiff (IJCAI 2022) / ProDiff (ACM-MM 2022a) for diffusion text-to-speech
  • Singing Voice Synthesis: SingGAN (ACM-MM 2022b) / Multi-Singer (ACM-MM 2021)

Multi-modal Language Processing

  • Audio-Visual Speech-to-Speech Translation: TranSpeech (ICLR 2023) / AV-TranSpeech (ACL 2023)
  • Self-Supervised Learning: Prosody-MAE (ACL 2023)
Arxiv 2023
sym
ICML 2023
sym
ICLR 2023
sym

One of our continuous efforts to reduce communication barrier, and we have follow-up works: Audio-Visual S2T (MixSpeech, ICCV 2023), Audio-Visual S2ST (AV-TranSpeech, ACL 2023), Multi-modal S2ST, Style-aware S2ST, Zero-shot S2ST. Code released: .

NeurIPS 2022
sym

The first zero-shot TTS generalizable to unseen speaker, emotion, and prosody! Media coverage: PaperWeekly, Speech Home. Code released: .

ICJAI 2022
sym

One of our continuous efforts in generative modeling, and we have follow-up works: FastDiff 2, ProDiff. We release a diffusion text-to-speech pipeline Hugging Face using ProDiff and FastDiff . Our work are promoted by different media and forums, such as Tencent AI Lab, Speech Home, and Twitter, which is a Trending Project at both Github and Paperwithcode.

Full Publication List

  • denotes co-first authors, # denotes co-supervised

2024

2023

2022

2021

2020 and Prior

Selected Honors Awarded

  • Excellent Graduate, Zhejiang Province (2024).
  • Chu Kochen Presidential Scholarship (2023), highest honor at Zhejiang University
  • ByteDance Scholar Fellowship (100k RMB Bonus), 10 students per year
  • ICML/ICLR Grant Award
  • Outstanding Reviewers, ICML’22. Top 10%.
  • National Scholarship (2022, 2023, Grauate student). Top 1%.
  • National Scholarship (2020, 2021, Undergrauate student). Top 1%.
  • Excellent Graduate, Zhejiang Province (2021).
  • Chu Kochen Presidential Scholarship Finalist (2021).
  • First Prize in American Mathematical Modeling Competition (2020).
  • First Prize of National Mathematical Modeling Competition in Zhejiang Province (2019).

Professional Services

  • Conference Reviewer/Program Committee: ICML 2022, ACM-MM 2022, NeurIPS 2022, ARR 2022, ICML 2023, ARR 2023, ACL 2023, EMNLP 2023, ACM-MM 2023, NeurIPS 2023, ICLR 2023, ICML 2023, Neuralcomputing, IJCAI 2024, ACM-MM 2024, ACL 2024, TIP
  • Assist to Review: KDD 2022, AAAI 2022, EMNLP 2022, PRCV 2021, TMM