🗂️️ Selected Projects
🎙 Speech Synthesis
Few-shot Voice Cloning and Style Transfer
- Achieved few-shot voice cloning using 20 utterances. The Similarity-MOS of timbre reached 4.6 with a MOS of 3.8 and a clear pronunciation correction effect on L2 English speakers.[patented]
- Pre-train and finetune paradigm and frame-level pitch modeling are used to achieve few-shot style transfer using 20 utterances. The style SMOS has been improved from 3.5 to 4.5 while naturalness MOS remains above 4.0.
🎼 Music
Probabilistic Topic Models Based Music Recommendation System supervisor: Vladimir Pavlovic
- Leveraged CRNN for music tagging, and exploited Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) probabilistic topic models for music topic modelling.
- Use KL divergence to compute the similarity of song-topic distributions for the recommendation.
💬 Speech Recognition & Evaluation
Recognition and Evaluation of Oral English
- Design and optimize the Goodness Of Pronunciation (GOP) feature, implementation, and tuning of LR, XGBoost, LSTM classifiers. Attained SOTA English oral evaluation consistency rate. [patented]
- Full pipeline chain-model training and optimization based on Kaldi framework, including corpus crawling, language and acoustic model training, Bi-RNN implementation, RNN-Rescore, etc.
- Achieved 5%-10% WER on various benchmark datasets and outperformed Google ASR API on children datasets.
🗣️ Voice Conversion
Voice Conversion Timbre Similarity Improvement
- Method: Optimized the bottleneck of hidden representation for an any-to-one PPG-pipeline VC system. [patented]
- Result: Improved the similarity MOS of voice timbre from 3.9 to 4.3.
- Implemented many-to-many VC models such as VQ-VAE, StarGAN-VC for comparison.