Text-to-Speech (TTS) systems have achieved remarkable proficiency in generating natural-sounding speech. However the synthesized speech from these systems often ends up sounding robotic or machine-like because it lacks the emotional qualities that humans have when they speak. Approach for emotional TTS is presented in order to overcome this constraint. Using emotion vectors taken from expressive voice samples, this method improves the text embedding by adding emotion from pre-trained models such as Tacotron. This paper adapt current TTS framework and include emotion vectors with textual input and produce emotionally complex voice output. This research project aims to make important improvements in the field of TTS synthesis, which will eventually improve the quality and naturalness of human-computer interactions.
Presentation Date : Nov. 14, 2024
Authors :
Yared Alemayehu Kebede
Rupesh Yadav
Sudip Thapa
AbdulRahman
Shweta Chauhan