← Back to VOLUME 2, ISSUE 11, NOVEMBER 2013
This work is licensed under a Creative Commons Attribution 4.0 International License.
Context Based Segmentation and Spectral Mismatch Reduction for More Naturalness with Application to Text to Speech (TTS) for Marathi Language
MRS SMITA KAWACHALE, DR. J.S.CHITODE Research Scholar, Electronics Department, Bharati Vidyapeeth College of Engineering, Pune, India Honorary Professor, Electronics Department, Bharati Vidyapeeth College of Engineering, Pune, India
Downloads: Download PDF
👁 41 views📥 1 download
Abstract: The field of text to speech (TTS) synthesis has been rapidly developing with widespread applications. There is a great demand for text to speech synthesis for Indian languages. TTS in English and world’s most used languages are been developed already. The proposed work is for Text to Speech conversion for Marathi language. This TTS is capable of speaking Marathi text. It is using ‘Hybrid Syllabic Approach’ where it forms and speaks new words from the syllables derived from the existing words in the database. Syllabic based speech synthesis is based on Consonant Vowel (CV) structure rules. An optimized soft cutting (segmentation) approach is followed for more naturalness and improved context based database.
The proposed work focuses on improving naturalness of TTS using context based segmentation. Context based segmentation is based on syllable position (I-Initial, M-Medium, F-Final). The proposed work focuses on position dependent (I/M/F) speech synthesis. Concatenation of position dependent syllable may result in less spectral mismatch (concatenation cost) and give more natural sounding audio output. By carrying out this spectral analysis it is possible to improve the naturalness and overall performance of TTS. Spectral mismatch reduction is carried out with different Time and Frequency domain parameters. The performance of proposed method is evaluated using Subjective and Objective validation methods.
Keywords: Text to Speech System, Spectral Smoothing, Concatenative TTS, Speech Synthesizer.
The proposed work focuses on improving naturalness of TTS using context based segmentation. Context based segmentation is based on syllable position (I-Initial, M-Medium, F-Final). The proposed work focuses on position dependent (I/M/F) speech synthesis. Concatenation of position dependent syllable may result in less spectral mismatch (concatenation cost) and give more natural sounding audio output. By carrying out this spectral analysis it is possible to improve the naturalness and overall performance of TTS. Spectral mismatch reduction is carried out with different Time and Frequency domain parameters. The performance of proposed method is evaluated using Subjective and Objective validation methods.
Keywords: Text to Speech System, Spectral Smoothing, Concatenative TTS, Speech Synthesizer.
How to Cite:
[1] MRS SMITA KAWACHALE, DR. J.S.CHITODE Research Scholar, Electronics Department, Bharati Vidyapeeth College of Engineering, Pune, India Honorary Professor, Electronics Department, Bharati Vidyapeeth College of Engineering, Pune, India, “Context Based Segmentation and Spectral Mismatch Reduction for More Naturalness with Application to Text to Speech (TTS) for Marathi Language,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE)
