Speech synthesis based on Hidden Markov Models and deep learning

Coto Jiménez, Marvin; Goddard Close, John

Speech synthesis based on Hidden Markov Models and deep learning

Files

rcs-112-1-2-with-cover-page-v2.pdf (1.24 MB)

Date

2016

Authors

Coto Jiménez, Marvin

Goddard Close, John

Abstract

Speech synthesis based on Hidden Markov Models (HMM) and other statistical parametric techniques have been a hot topic for some time. Using this techniques, speech synthesizers are able to produce intelligible and flexible voices. Despite progress, the quality of the voices produced using statistical parametric synthesis has not yet reached the level of the current predominant unit-selection approaches, that select and concatenate recordings of real speech. Researchers now strive to create models that more accurately mimic human voices. In this paper, we present our proposal to incorporate recent deep learning algorithms, specially the use of Long Short-term Memory (LSTM) to improve the quality of HMM-based speech synthesis. Thus far, the results indicate that HMM-voices can be improved using this approach in its spectral characteristics, but additional research should be conducted to improve other parameters of the voice signal, such as energy and fundamental frequency, to obtain more natural sounding voices.

Keywords

Long short-term memory (LSTM), Hidden Markov Models (HMM), Speech synthesis, Statistical parametric speech synthesis, Deep learning

Citation

https://www.rcs.cic.ipn.mx/2016_112/

URI

https://hdl.handle.net/10669/86307

Collections

Ingeniería eléctrica

Full item page

Speech synthesis based on Hidden Markov Models and deep learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By