🗂️️ Selected Projects

🎙 Speech Synthesis

Few-shot Voice Cloning and Style Transfer

Achieved few-shot voice cloning using 20 utterances. The Similarity-MOS of timbre reached 4.6 with a MOS of 3.8 and a clear pronunciation correction effect on L2 English speakers.[patented]
Pre-train and finetune paradigm and frame-level pitch modeling are used to achieve few-shot style transfer using 20 utterances. The style SMOS has been improved from 3.5 to 4.5 while naturalness MOS remains above 4.0.

Probabilistic Topic Models Based Music Recommendation System supervisor: Vladimir Pavlovic

Leveraged CRNN for music tagging, and exploited Latent Dirichlet Allocation (LDA) and Hierarchical Dirichlet Process (HDP) probabilistic topic models for music topic modelling.
Use KL divergence to compute the similarity of song-topic distributions for the recommendation.

Recognition and Evaluation of Oral English

Design and optimize the Goodness Of Pronunciation (GOP) feature, implementation, and tuning of LR, XGBoost, LSTM classifiers. Attained SOTA English oral evaluation consistency rate. [patented]
Full pipeline chain-model training and optimization based on Kaldi framework, including corpus crawling, language and acoustic model training, Bi-RNN implementation, RNN-Rescore, etc.
Achieved 5%-10% WER on various benchmark datasets and outperformed Google ASR API on children datasets.

Voice Conversion Timbre Similarity Improvement

Method: Optimized the bottleneck of hidden representation for an any-to-one PPG-pipeline VC system. [patented]
Result: Improved the similarity MOS of voice timbre from 3.9 to 4.3.
Implemented many-to-many VC models such as VQ-VAE, StarGAN-VC for comparison.