🗂️️ Selected Projects

🎙 Speech Synthesis

Few-shot Voice Cloning and Style Transfer

  • Achieved few-shot voice cloning using 20 utterances. The Similarity-MOS of timbre reached 4.6 with a MOS of 3.8 and a clear pronunciation correction effect on L2 English speakers.[patented]
  • Pre-train and finetune paradigm and frame-level pitch modeling are used to achieve few-shot style transfer using 20 utterances. The style SMOS has been improved from 3.5 to 4.5 while naturalness MOS remains above 4.0.

🎼 Music

Probabilistic Topic Models Based Music Recommendation System supervisor: Vladimir Pavlovic

  • Leveraged CRNN for music tagging, and exploited Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) probabilistic topic models for music topic modelling.
  • Use KL divergence to compute the similarity of song-topic distributions for the recommendation.

💬 Speech Recognition & Evaluation

Recognition and Evaluation of Oral English

  • Design and optimize the Goodness Of Pronunciation (GOP) feature, implementation, and tuning of LR, XGBoost, LSTM classifiers. Attained SOTA English oral evaluation consistency rate. [patented]
  • Full pipeline chain-model training and optimization based on Kaldi framework, including corpus crawling, language and acoustic model training, Bi-RNN implementation, RNN-Rescore, etc.
  • Achieved 5%-10% WER on various benchmark datasets and outperformed Google ASR API on children datasets.

🗣️ Voice Conversion

Voice Conversion Timbre Similarity Improvement

  • Method: Optimized the bottleneck of hidden representation for an any-to-one PPG-pipeline VC system. [patented]
  • Result: Improved the similarity MOS of voice timbre from 3.9 to 4.3.
  • Implemented many-to-many VC models such as VQ-VAE, StarGAN-VC for comparison.